13.11.08

kb31: How to build a detector from a custom region in an image?

Problem: How to create a detector for a custom region of interest in an image?
Solution: First identify the region, second train a detector, third apply the detector to a new image.

Let’s first load and visualize an image.

>> a=imread(’roadsign09.bmp’); 
>> a=im2feat(a);         % convert the image into a dataset object
>> sdimage(a) 


We are interested in detecting the road area. In the Image menu go to Paint class and select Create new class. A pop up window will ask for the name of the class, e.g. road. Paint a region inside the road area. Note that you may set the size of the brush yourself by clicking the right mouse button or using the Image menu. Click the right mouse button and select Stop painting. In this way we have labelled a region of interest in the image.

Save the data with the labels by pressing the s key on the keyboard. A pop up window will ask for the name of the new dataset, e.g. a2. We can now select a smaller dataset and train quadratic classifier:
>> b=gendat(a2,[400 400]) 
>> w=qdc(b) 
>> out=b*w 

The quadratic classifier provides soft output for each of the two classes. We are interested in building a road detector operating only on the road output. In order to adjust the detector threshold, we will use the ROC analysis on the road output only (for more details on ROC on a single output (thresholding) click here )
>> r=sdroc(out(:,’road’)) 
>> sddrawroc(r) 

Select an appropriate operating point minimizing error on the road class and save the decision mapping into variable wd (press s key). In the example below we choose the operating point number 94, which has 0.02 error on the class road, as indicated in the figure title.

Now we can construct the road detector. Because we want to pass only the road output of the classifier to the decision mapping, we will select the corresponding output feature using featsel mapping:
>> getlab(w) 
ans = 
default 
road 
>> wroad=w*featsel(2,2)*wd 

We can now apply the detector to the entire image a and visualize the decisions:
>> c=setlabels(a,a*wroad) 
>> sdimage(c) 


The detector wroad can be applied to any new image d. Let’s inspect the results on the image roadsign12.bmp.
>> d=imread(’roadsign12.bmp’); 
>> d=im2feat(d); 
>> d=setlabels(d,d*wroad) 
>> sdimage(d) 

04.11.08

kb27: Perform leave-one-out evaluation

Problem: How to estimate error of an algorithm using leave-one-out cross-validation?
Solution: Use sdcrossval function and specify the ‘method’ parameter

Leave-one-out is an evaluation scheme repeatedly training an algorithm of interest on the full dataset excluding only one example and performing test on it. Because larger training sets are used compared to any other cross-validation technique, leave-one-out yields high accuracies. However, the estimated variances are also very high.

PRSD Studio provides leave-one-out evaluation using the standard sdcrossval function. We will only need to specify the ‘method’ parameter as ‘loo’ or ‘leave-one-out’ to initiate it. In the following example, we perform leave-one-out evaluation of PRTools nearest mean classifier on a two-class problem:


>> a=sdrelab(gendatb,{1 'apple'; 2 'banana'})
new lablist:
1: 1 -> apple 
2: 2 -> banana
Banana Set, 100 by 2 dataset with 2 classes: [50  50]

>> r1=sdcrossval(nmc,a,'method','loo')
 samples: ....................................................................................................
ROC (w-based op.point, 2 measures, 100 folds)
est: 1:err(apple)=0.11(0.31), 2:err(banana)=0.11(0.31)

The decisions were made at the default operating point using equal weight for each class. sdcrossval can also perform evaluation at a set of operating points. We will use this feature to estimate a simple weighting-based ROC curve on a pre-defined set of operating points using standard rotation cross-validation and using leave-one-out. Let us define operating points using a simple
>> W=[0:0.02:1]';
>> W=[W 1-W];
>> ops=sdops('w',W,getlablist(a))
Weight-based operating set (51 ops, 2 classes) at op 1

27.10.08

kb26: Interactively defining detector for image data

Problem: How to define a detector for image data by hand-painting the class of interest?
Solution: Use sdimage command to define labels and sdroc to adjust the detector threshold.

PRSD Toolbox provides interactive tools for interactive definition of classes in image data. It also allows us to turning the statistical models, trained on these user-painted classes, into class detectors (one-class classifiers).
In this example, we aim at constructing a color-based “road” detector for simple traffic scene images. We start from a RGB image of a road scene:



We can visualize this image using PRSD Toolbox command sdimage which gives us number of interactive tools.

>> sdimage(im)

The blue transparent color is the default class label assigned to all image pixels. We can define new classes and paint image regions of interest. Using the right mouse click we open the context menu and select Paint class / Create new class. We enter road as the class name and paint the road region. The brush size may be adjusted by the context menu.

14.06.08

kb18: How to convert Support Vector machine trained in LIBSVM into a pipeline?

Problem: How to execute support vector classifier trained in LIBSVM using libPRSD?
Solution: Call sdp_svc pipeline constructor and fill in parameters trained by LIBSVM.

PRSD Studio exposes the basic pattern recognition algorithms to the user as pipeline actions. For each algorithm, we may construct an execution pipeline directly by supplying its canonical parameters (see function reference for parameters of pipeline actions).
Usually, we derive the method parameters by training the algorithm under PRTools but this is not necessary.
We can as well train the algorithm using external tools or libraries as long as we are able to provide its parameters to the pipeline constructor under Matlab.

In this example, we use “A simple MATLAB interface” of LIBSVM authors, you can download from here (ver.2.68).
Under Matlab, we will use PRTools to create a two-class “banana” dataset:

>> a=gendatb
Banana Set, 100 by 2 dataset with 2 classes: [50  50]
>> scatterdui(a)

Now, we invoke LIBSVM to train the RBF SVM with sigma=2.0 (gamma=1/2). We provide the numerical labels and raw data (+a is a shortcut for double(a)):

>> model = svmtrain(getnlab(a), +a, '-g 0.5');
>> model
model = 
    Parameters: [5x1 double]
      nr_class: 2
       totalSV: 63
           rho: 0.0547
         Label: [2x1 double]
         ProbA: []
         ProbB: []
           nSV: [2x1 double]
       sv_coef: [63x1 double]
           SVs: [63x2 double]

The resulting model structure contains all the parameters we need to define the SVC pipeline action:

>> p=sdp_svc('rbf',1/0.5,model.SVs,model.sv_coef,model.rho)
sequential pipeline 2x1 ''
 1  sdp_svc         2x1  'rbf', par=2.0, 63 SVs
>> rand(3,2)*p % example execution on random 2D data
ans =
   -1.2691
   -1.0817
   -1.4567

We can visualize the raw output of the pipeline action using sdscatter function:

>> sdscatter(a,p)

What we need in real application are, of course, decisions. Here, we simply fix the decision threshold to zero. First, we create the thresholding operating point for our artificial problem. Then we construct the decision pipeline action and concatenate it with the SVC action to form a new two-step sequential pipeline. Finally, we visualize the pipeline results (decisions)
in the scatter plot:

>> ops=sdops('thr',0,{'apple','banana'})
Thr-based operating point thr=0.00
>> pd=sdp_decide(ops)
sequential pipeline 1x1 ''
 1  sdp_decide      1x1  Threshold-based decision on apple at op 1
>> p2=[p pd]
sequential pipeline 2x1 ''
 1  sdp_svc         2x1  'rbf', par=2.0, 63 SVs
 2  sdp_decide      1x1  Threshold-based decision on apple at op 1
>> sdscatter(a,p2)


Finally, we will compare the execution speed of the trained SVC under libPRSD and LIBSVM. We create a random large dataset with 100 000 samples. We also need “labels” as the LIBSVM execution interface is designed for “testing”, not only for “execution”:

>> test=rand(100000,2);
>> lab=ones(size(test,1),1);

>> tic; [predict_label, accuracy, dec_values] = svmpredict(lab, test, model); toc
Accuracy = 0% (0/100000) (classification)
Elapsed time is 0.585559 seconds.

>> tic; out=sdexe(p2,test); toc
Elapsed time is 0.088784 seconds.

>> 0.585559/0.088784
ans =
    6.5953

Execution under libPRSD gives us 6.5 x speedup. Not bad!

The pipeline may be now exported using sdexport and directly run in a custom applications outside Matlab.

11.06.08

kb17: How to switch between operating points?

Problem: How can the user perform decisions at different operating points (thresholds or sets of per-class weights)?
Solution: Multiple operating points may be stored in a decision mapping (sddecide). The user may manipulate the current operating point using setcurop, getcurop, getcuropdata functions.

Operating point fully defines how soft classifier outputs get converted into a crisp decision. PRSD Studio supports thresholding-based and weighting-based operating points. The decisions are performed by a decision mapping (sddecide) which may store multiple operating points. There is always one current point used to make decisions. The user may select a different point by setcurop function. This is handy in situations where the recognition system may work in several modes considering, e.g. different prior probabilities (mode 1 : screening population, disease prior is low; mode 2: clinical test, disease prior is high because most of the patients do have some problem).

In this example, we train a classifier on a two-class problem:

>> tr
Banana Set, 45 by 2 dataset with 2 classes: [18  27]
>> w=qdc(tr)
Bayes-Normal-2, 2 to 2 trained  mapping   --> normal_map

We define a decision mapping based on class output weighting with two operating points (50/50 and 20/80 priors). Note that we must define the decision names as this information is needed to perform decisions:

>> wd=sddecide('w',[0.5 0.5; 0.2 0.8],{'apple','banana'})
Weight-based decision (2 classes, 2 ops) at op 1, 2 to 1 trained  mapping   --> sddecide

The decision mapping is by default set to the operating point #1 (50/50). We can perform decisions by applying it to the soft classifier output:

>> ts
Banana Set, 5 by 2 dataset with 2 classes: [2  3]
>> dec1=ts*w*wd;
>> confmat(getlab(ts),dec1)

  True   | Estimated Labels
  Labels | apple  banana| Totals
 --------|--------------|-------
  apple  |    2      0  |    2
  banana |    1      2  |    3
 --------|--------------|-------
  Totals |    3      2  |    5

To switch the operating points to #2 (20/80 priors) we do:

>> wd=setcurop(wd,2)
Weight-based decision (2 classes, 2 ops) at op 2, 2 to 1 trained  mapping   --> sddecide

>> dec2=ts*w*wd;
>> confmat(getlab(ts),dec2)

  True   | Estimated Labels
  Labels | apple  banana| Totals
 --------|--------------|-------
  apple  |    1      1  |    2
  banana |    1      2  |    3
 --------|--------------|-------
  Totals |    2      3  |    5

10.06.08

kb15: How to perform cross-validation with replicas (multiple measurements per object)?

Problem: Replicas are multiple observations gathered for the same object (for example several measurements per patient). When using replicas, one needs to take care that samples originating from the same object will not occur both in the training set and in the test set. If that happens, the estimated performance will be too optimistic because the classifier is partially evaluated on examples almost identical to training data. How to cross-validate with replicas?

Solution: For each sample in your data, create a specific property identifying the unique object (patient, image, piece of fruit) handled by the recognition system. Perform cross-validation using sdcrossval as usual only specifying the property holding the definition of replicas.

Let us illustrate the use of replicas on an artificial dataset with 50 samples in two classes.
We create a vector of integers identifying the actual pieces of fruit measured. All examples
with the same fruitid were acquired from the same piece of fruit. There are five measurements
per piece of fruit in our problem:

>> a=gendatb([20 30]); a=setlablist(a,strvcat({'apple','banana'}))
Banana Set, 50 by 2 dataset with 2 classes: [20  30]
>> fruitid=genlab(5*ones(10,1));
>> fruitid(1:13)' % first few entries

ans =

     1     1     1     1     1     2     2     2     2     2     3     3     3

We will now set the ‘fruitid’ property into the dataset:

>> a=setprop(a,'fruitid',fruitid)
Banana Set, 50 by 2 dataset with 2 classes: [20  30]
>> getproplist(a) % existing properties

ans = 

    'ident'
    'fruitid'

>> getprop(a(23:27,:),'fruitid') % retrieve fruitids for a subset of samples:

ans =

     5
     5
     5
     6
     6

Classical 10-fold cross-validation of an algorithm (here Nearest Mean classifier) for our data set would be invoked by:

>> [r,e]=sdcrossval(nmc,a)
10 folds: [1: ] [2: ] [3: ] [4: ] [5: ] [6: ] [7: ] [8: ] [9: ] [10: ] 
ROC (w-based op.point, 7 measures, 10 folds)
est: 1:err(apple)=0.25(0.26), 2:err(banana)=0.20(0.17), 3:FPr(apple)=0.20(0.17), 4:TPr(apple)=…
completed 10-fold evaluation 'sde_rotation' (alg: 'sda_basic')

We can access per-fold training and test sets used through the object e. Here we ask for datasets used in the 1st fold:

>> tr=gettrdata(e,a,1)
Banana Set, 45 by 2 dataset with 2 classes: [18  27]
>> ts=gettsdata(e,a,1)
Banana Set, 5 by 2 dataset with 2 classes: [2  3]

Let us print unique fruit ids used in training and test set:

>> unique(getprop(ts,'fruitid'))'

ans =

     2     4     5     6     9

>> unique(getprop(tr,'fruitid'))'

ans =

     1     2     3     4     5     6     7     8     9    10

Note that each of the test set examples originates from fruit used in training!

We want to avoid this situation and thereby launch cross-validation on fruitid replicas:

>> [r2,e2]=sdcrossval(nmc,a,'replicas','fruitid')
??? Error using ==> prsd_toolbox/private/make_rot_pool at 37
at most 4 folds may be requested

Error in ==> prsd_toolbox/private/sde_rotation.p>sde_rotation at 88
Error in ==> prsd_toolbox/sdcrossval.p>sdcrossval at 57

We received an error message stating that the default number of 10 folds is not possible with a given fruit pieces count. The possible maximum number of folds is equal to the number of fruit pieces in the smallest class.  In our case, we can check this information using sduniqueprop which construct a dataset with a single sample per each piece of fruit:

>> b=sduniqueprop(a,'fruitid')
10 by 1 dataset with 2 classes: [4  6]

>> [r2,e2]=sdcrossval(nmc,a,'replicas','fruitid','folds',4)
4 folds: [1: ] [2: ] [3: ] [4: ] 
ROC (w-based op.point, 7 measures, 4 folds)
est: 1:err(apple)=0.25(0.19), 2:err(banana)=0.23(0.26), 3:FPr(apple)=0.23(0.26), 4:TPr(apple)=…
completed 4-fold evaluation 'sde_rotation' (alg: 'sda_basic')
>> tr2=gettrdata(e2,a,1)
Banana Set, 35 by 2 dataset with 2 classes: [15  20]
>> ts2=gettsdata(e2,a,1)
Banana Set, 15 by 2 dataset with 2 classes: [5  10]
>> unique(getprop(ts2,'fruitid'))'

ans =

     1     6     8

>> unique(getprop(tr2,'fruitid'))'

ans =

     2     3     4     5     7     9    10

The fruit pieces used in the fold training and test sets are different. The estimated performance will be, thereby, more indicative of the expected production situation.

10.06.08

kb14: How to make decisions at a default operating point?

Problem: How to define a decision mapping using a default operating point without ROC analysis.
Solution: Use sddecide directly on the trained mapping.

>> w  % trained mapping
Bayes-Normal-2, 2 to 2 trained  mapping   --> normal_map
>> wd=sddecide(w)
Weight-based decision (2 cls), 2 to 1 trained  mapping   --> sddecide
>> dec=a(1:10,:)*w*wd

dec =

apple 
apple 
apple 
apple 
apple 
apple 
banana
banana
apple 
banana

10.06.08

kb13: How to find samples with a specific type of error in a confusion matrix?

Problem: To find out what samples suffer from a specific type of error (defined by a confusion matrix).

Solution: Use the sdconfmatind function to find indices of samples in a specific cell of a confusion matrix.

Let us assume a two class banana dataset split into a training and test set:

>> a=gendatb; a=setlablist(a,strvcat({'apple','banana'}))
Banana Set, 100 by 2 dataset with 2 classes: [50  50]
>> [tr,ts]=gendat(a,0.5)
Banana Set, 50 by 2 dataset with 2 classes: [25  25]
Banana Set, 50 by 2 dataset with 2 classes: [25  25]

We trained a classifier and fixed an operating point:

>> w=qdc(tr)
Bayes-Normal-2, 2 to 2 trained  mapping   --> normal_map
>> wd=sddecide(w)
Weight-based decision (2 cls), 2 to 1 trained  mapping   --> sddecide

So we can now get the classifier decisions on the test set:

>> dec=ts*w*wd;
>> dec(1:10,:)

ans =

apple 
apple 
apple 
apple 
apple 
banana
banana
apple 
apple 
apple 

The confusion matrix compares the ground-truth labels, stored in the test dataset ts, to the
decisions dec:

>> confmat(getlab(ts),dec)

  True   | Estimated Labels
  Labels | apple  banana| Totals
 --------|--------------|-------
  apple  |   20      5  |   25
  banana |    4     21  |   25
 --------|--------------|-------
  Totals |   24     26  |   50

We would now like to find out, what are the 5 apple examples that are misclassified as banana by our classifier.
We use the sdconfmatind function providing it with ground truth labels, decisions and the true and estimated
class defining the confusion matrix cell (here ‘apple’ and ‘banana’):

>> ind=sdconfmatind(getlab(ts),dec,'apple','banana')

ind =

     6
     7
    12
    13
    25

>> dec(ind,:)

ans =

banana
banana
banana
banana
banana

15.05.08

kb10: Can I perform multi-class ROC analysis?

Yes. PRSD Toolbox supports multi-class ROC analysis with decisions based on weighting the per-class soft output of the classifier. Example on multi-class ROC analysis is presented here.

15.05.08

kb9: Can I access the source code?

No. PRSD Toolbox and libPRSD are closed-source products. 

15.05.08

kb8: Can I call the classifier in libPRSD directly as a C function?

No. LibPRSD does not expose individual classifiers as C functions. The pattern recogntiion algorithm is instead exposed as a black box. The user connects its input and output to the application buffers and then only invokes the execution function.

more text

15.05.08

kb7: Do I need PRTools license in deployment?

No. LibPRSD is not dependent on Matlab and so also not on PRTools.

15.05.08

kb6: Is Matlab required/used in deployment?

No. LibPRSD does not depend on Matlab environment or libraries. LibPRSD is from the ground up implemented in ANSI C without external dependencies.