Algorithms

An algorithm is a Matlab object providing an easy-to-create abstraction of a specific recognition system. The training and execution logic of an algorithm is defined in a single Matlab function. All the parameters needed to perform algorithm training and deliver decisions on new data are stored inside the algorithm object.

Introduction

PRSD Toolbox algorithms are Matlab objects of class sdalg encapsulating the training and execution of a pattern recognition system. The algorithm consists of two distinct parts, namely the algorithm definition function and parameters. While the algorithm definition function describes the steps taken in order to train and execute the algorithm, the parameters define the content used during the training and execution. The result of an algorithm execution on new data are decisions.

The algorithm object exposes its parameters to the user similarly to a Matlab structure. During its life-cycle, the algorithm object undergoes three distinct processes:

  • Initialization. All the parameters needed to commence the training are fixed. These training parameters are stored under the trpar field. The initialized algorithm becomes untrained.
  • Training.Algorithm training consists of estimation of various parameters from the labeled training data. This may include feature selection or extraction, classifier training, and definition of a decision operating point. Eventually, the algorithm becomes trained which means it is capable of returning decisions on new, unlabeled, data. The parameters needed for execution are stored in the expar field.
  • Execution Presented with data, the algorithm uses the parameters in expar structure to derive decisions.

Using algorithms

Algorithm object is created by calling its definition function without parameters. The sda_featex algorithm below is a two-stage system composed of a feature extractor (by default PCA) followed by a classifier (by default quadratic classifier assuming Gaussian densities):

>> alg=sda_simple
untrained sdalg 'sda_simple'
trpar:
 featex          0x0       2386  mapping  PCA ret. 90.0% var, untrained  mapping   --> pca
 clf             0x0       2506  mapping  Bayes-Normal-2, untrained  mapping   --> qdc

The featex and clf fields contain untrained PRTools mappings. Similarly to the Matlab structure, the algorithm fields may be accessed by the dot notation. We can, for example, change the feature extractor to the PCA mapping projecting to the 3D subspace:

>> alg.trpar.featex=pca([],3)
untrained sdalg 'sda_simple'
trpar:
 featex          0x0       2368  mapping  PCA to 3D, untrained  mapping   --> pca
 clf             0x0       2506  mapping  Bayes-Normal-2, untrained  mapping   --> qdc

Similarly to a PRTools mapping, algorithm training is performed by multiplying a labeled dataset with an untrained algorithm. Here we train the algorithm alg on the artificially generated two-class dataset with 5 features:

>> data=gendatd(200,5); data=setlablist(data,strvcat({'apple','pear'}))
Difficult Dataset, 200 by 5 dataset with 2 classes: [108   92]
>> tralg=data*alg
trained sdalg 'sda_simple'
trpar:...
expar:
 w               5x3       2882  mapping  PCA to 3D, 5 to 3 trained  mapping   --> affine
 clf             3x2       3060  mapping  Bayes-Normal-2, 3 to 2 trained  mapping   --> normal_map
 decision        1x1          8  double   1
 wd              2x1       4050  mapping  Weight-based decision (2 cls), 2 to 1 trained  mapping   --> sddecide

The algorithm training filled the expar field with the information necessary for algorithm execution. The w field contains a trained PCA feature extractor projecting the 5D input data to 3D subspace. The trained quadratic classifier clf provides output for each of the two classes. The decision field is a simple flag specifying whether the trained algorithm applied to new data returns decisions (default behaviour) or the raw classifier outputs (when decision==0). Finally, the wd field contains the decision mapping performing the decisions at the default operating point.

The trained algorithm may be applied to a new data matrix or PRTools dataset:

>> newdata=rand(2,5) % two 5D data samples
newdata =
    0.7332    0.5700    0.1999    0.5388    0.5262
    0.3013    0.5468    0.6825    0.5460    0.3379
>> newdata*tralg 
ans =
apple
apple

The decisions were performed at the default operating point assuming equal importance of both classes. PRSD Toolbox provides a rich set of tools for the definition and evaluation of user-defined operating points. For details, see the Performing decisions article.

Creating algorithms from PRTools mappings

Any PRTools mapping may be converted into the sdalg algorithm by calling the sdalg command. Both the untrained and trained mappings may be turned into algorithms:

>> alg=sdalg(nmc) % converting an untrained mapping
untrained sdalg 'sda_basic'
trpar:
 w               0x0       2306  mapping  Nearest Mean, untrained  mapping   --> nmc
>> tralg=data*alg
trained sdalg 'sda_basic'
trpar:...
expar:
 w               5x2       2836  mapping  Nearest Mean, 5 to 2 trained classifier --> affine
 decision        1x1          8  double   1
 wd              2x1       4050  mapping  Weight-based decision (2 cls), 2 to 1 trained  mapping   --> sddecide

>> w=fisherc(data) % trained PRTools mapping (Fisher linear discriminant)
Fisher, 5 to 2 trained classifier --> affine
>> tralg=sdalg(w)
trained sdalg 'sda_basic'
trpar:...
expar:
 w               5x2       2824  mapping  Fisher, 5 to 2 trained classifier --> affine
 decision        1x1          8  double   1
 wd              2x1       4050  mapping  Weight-based decision (2 cls), 2 to 1 trained  mapping   --> sddecide

Creating custom algorithms

In order to define a custom algorithm, the user needs to write the algorithm definition function specifying the initialization, training, and execution steps. The example of a sda_simple algorithm is shown below.

The algorithm definition functions adopt a unified interface. All the parameters needed for training and execution are defined and stored within the algorithm object.

%SDA_SIMPLE feature extractor followed by a classifier
%
% DESCRIPTION
% SDA_SIMPLE is a two-stage algorithm consisting of a feature extractor
% followed by a classifier

function out=sda_simple(alg,data)
    
    if nargin==0 % initialize the algorithm ===============
        alg=sdalg(mfilename);

        % set up all the parameters needed for training
        alg.trpar.featex=pca([],0.9); % feature extraction mapping
        alg.trpar.clf=qdc; % classifier

        out=setstate(alg,'untrained');
        return
    end
    
    if ~istrained(alg) % training =========================
    % derive all parameters needed for execution (expar)

        % train the feature extractor
        alg.expar.w=data*alg.trpar.featex;
        % project the data
        data=data*alg.expar.w;
        % train the classifier
        alg.expar.clf=data*alg.trpar.clf;

        % set flag: by default return decisions
        alg.expar.decision=1;
        % define the decision at the default operating point
        alg.expar.wd=sddecide(alg.expar.clf);
        
        out=setstate(alg,'trained');
    else % execution ======================================
    % use execution parameters to provide output on new data
        
        % extract features
        data=data*alg.expar.w;
        % execute the classifier
        out=data*alg.expar.clf;

        % decision ------------------------------
        if isfield(alg.expar,'decision') & alg.expar.decision==1
            out=out*alg.expar.wd;
        end
    end
    
    return

Next step: The basics: Performing decision