kb18: How to convert Support Vector machine trained in LIBSVM into a pipeline?
Problem: How to execute support vector classifier trained in LIBSVM using libPRSD?
Solution: Call sdp_svc pipeline constructor and fill in parameters trained by LIBSVM.
PRSD Studio exposes the basic pattern recognition algorithms to the user as pipeline actions. For each algorithm, we may construct an execution pipeline directly by supplying its canonical parameters (see function reference for parameters of pipeline actions).
Usually, we derive the method parameters by training the algorithm under PRTools but this is not necessary.
We can as well train the algorithm using external tools or libraries as long as we are able to provide its parameters to the pipeline constructor under Matlab.
In this example, we use “A simple MATLAB interface” of LIBSVM authors, you can download from here (ver.2.68).
Under Matlab, we will use PRTools to create a two-class “banana” dataset:
>> a=gendatb Banana Set, 100 by 2 dataset with 2 classes: [50 50] >> scatterdui(a)
Now, we invoke LIBSVM to train the RBF SVM with sigma=2.0 (gamma=1/2). We provide the numerical labels and raw data (+a is a shortcut for double(a)):
>> model = svmtrain(getnlab(a), +a, '-g 0.5');
>> model
model =
Parameters: [5x1 double]
nr_class: 2
totalSV: 63
rho: 0.0547
Label: [2x1 double]
ProbA: []
ProbB: []
nSV: [2x1 double]
sv_coef: [63x1 double]
SVs: [63x2 double]
The resulting model structure contains all the parameters we need to define the SVC pipeline action:
>> p=sdp_svc('rbf',1/0.5,model.SVs,model.sv_coef,model.rho)
sequential pipeline 2x1 ''
1 sdp_svc 2x1 'rbf', par=2.0, 63 SVs
>> rand(3,2)*p % example execution on random 2D data
ans =
-1.2691
-1.0817
-1.4567
We can visualize the raw output of the pipeline action using sdscatter function:
>> sdscatter(a,p)
What we need in real application are, of course, decisions. Here, we simply fix the decision threshold to zero. First, we create the thresholding operating point for our artificial problem. Then we construct the decision pipeline action and concatenate it with the SVC action to form a new two-step sequential pipeline. Finally, we visualize the pipeline results (decisions)
in the scatter plot:
>> ops=sdops('thr',0,{'apple','banana'})
Thr-based operating point thr=0.00
>> pd=sdp_decide(ops)
sequential pipeline 1x1 ''
1 sdp_decide 1x1 Threshold-based decision on apple at op 1
>> p2=[p pd]
sequential pipeline 2x1 ''
1 sdp_svc 2x1 'rbf', par=2.0, 63 SVs
2 sdp_decide 1x1 Threshold-based decision on apple at op 1
>> sdscatter(a,p2)
Finally, we will compare the execution speed of the trained SVC under libPRSD and LIBSVM. We create a random large dataset with 100 000 samples. We also need “labels” as the LIBSVM execution interface is designed for “testing”, not only for “execution”:
>> test=rand(100000,2);
>> lab=ones(size(test,1),1);
>> tic; [predict_label, accuracy, dec_values] = svmpredict(lab, test, model); toc
Accuracy = 0% (0/100000) (classification)
Elapsed time is 0.585559 seconds.
>> tic; out=sdexe(p2,test); toc
Elapsed time is 0.088784 seconds.
>> 0.585559/0.088784
ans =
6.5953
Execution under libPRSD gives us 6.5 x speedup. Not bad!
The pipeline may be now exported using sdexport and directly run in a custom applications outside Matlab.
