Algorithms
An algorithm is a Matlab object providing an easy-to-create abstraction of a specific recognition system. The training and execution logic of an algorithm is defined in a single Matlab function. All the parameters needed to perform algorithm training and deliver decisions on new data are stored inside the algorithm object.
Introduction
PRSD Toolbox algorithms are Matlab objects of class sdalg encapsulating
the training and execution of a pattern recognition system. The algorithm consists
of two distinct parts, namely the algorithm definition function and parameters.
While the algorithm definition function describes the steps taken in order to train
and execute the algorithm, the parameters define the content used during the training and execution.
The result of an algorithm execution on new data are decisions.
The algorithm object exposes its parameters to the user similarly to a Matlab structure. During its life-cycle, the algorithm object undergoes three distinct processes:
- Initialization. All the parameters needed to commence the training are fixed. These training parameters are stored under the
trparfield. The initialized algorithm becomes untrained. - Training.Algorithm training consists of estimation of various parameters from the labeled
training data. This may include feature selection or extraction, classifier training, and definition of a
decision operating point. Eventually, the algorithm becomes trained which means it is capable of returning
decisions on new, unlabeled, data. The parameters needed for execution are stored in the
exparfield. - Execution Presented with data, the algorithm uses the parameters in
exparstructure to derive decisions.
Using algorithms
Algorithm object is created by calling its definition function without parameters.
The sda_featex algorithm below is a two-stage system composed of a
feature extractor (by default PCA) followed by a classifier (by default quadratic classifier
assuming Gaussian densities):
>> alg=sda_simple untrained sdalg 'sda_simple' trpar: featex 0x0 2386 mapping PCA ret. 90.0% var, untrained mapping --> pca clf 0x0 2506 mapping Bayes-Normal-2, untrained mapping --> qdc
The featex and clf fields contain untrained PRTools mappings.
Similarly to the Matlab structure, the algorithm fields may be accessed by the dot notation.
We can, for example, change the feature extractor to the PCA mapping projecting to the 3D subspace:
>> alg.trpar.featex=pca([],3) untrained sdalg 'sda_simple' trpar: featex 0x0 2368 mapping PCA to 3D, untrained mapping --> pca clf 0x0 2506 mapping Bayes-Normal-2, untrained mapping --> qdc
Similarly to a PRTools mapping, algorithm training is performed by multiplying a labeled dataset with an untrained algorithm.
Here we train the algorithm alg on the artificially generated two-class dataset with 5 features:
>> data=gendatd(200,5); data=setlablist(data,strvcat({'apple','pear'}))
Difficult Dataset, 200 by 5 dataset with 2 classes: [108 92]
>> tralg=data*alg
trained sdalg 'sda_simple'
trpar:...
expar:
w 5x3 2882 mapping PCA to 3D, 5 to 3 trained mapping --> affine
clf 3x2 3060 mapping Bayes-Normal-2, 3 to 2 trained mapping --> normal_map
decision 1x1 8 double 1
wd 2x1 4050 mapping Weight-based decision (2 cls), 2 to 1 trained mapping --> sddecide
The algorithm training filled the expar field with the information necessary for algorithm execution.
The w field contains a trained PCA feature extractor projecting the 5D input data to 3D subspace.
The trained quadratic classifier clf provides output for each of the two classes.
The decision field is a simple flag specifying whether the trained algorithm applied to new data
returns decisions (default behaviour) or the raw classifier outputs (when decision==0).
Finally, the wd field contains the decision mapping performing the decisions at the default
operating point.
The trained algorithm may be applied to a new data matrix or PRTools dataset:
>> newdata=rand(2,5) % two 5D data samples
newdata =
0.7332 0.5700 0.1999 0.5388 0.5262
0.3013 0.5468 0.6825 0.5460 0.3379
>> newdata*tralg
ans =
apple
apple
The decisions were performed at the default operating point assuming equal importance of both classes. PRSD Toolbox provides a rich set of tools for the definition and evaluation of user-defined operating points. For details, see the Performing decisions article.
Creating algorithms from PRTools mappings
Any PRTools mapping may be converted into the sdalg algorithm by calling the sdalg
command. Both the untrained and trained mappings may be turned into algorithms:
>> alg=sdalg(nmc) % converting an untrained mapping untrained sdalg 'sda_basic' trpar: w 0x0 2306 mapping Nearest Mean, untrained mapping --> nmc >> tralg=data*alg trained sdalg 'sda_basic' trpar:... expar: w 5x2 2836 mapping Nearest Mean, 5 to 2 trained classifier --> affine decision 1x1 8 double 1 wd 2x1 4050 mapping Weight-based decision (2 cls), 2 to 1 trained mapping --> sddecide >> w=fisherc(data) % trained PRTools mapping (Fisher linear discriminant) Fisher, 5 to 2 trained classifier --> affine >> tralg=sdalg(w) trained sdalg 'sda_basic' trpar:... expar: w 5x2 2824 mapping Fisher, 5 to 2 trained classifier --> affine decision 1x1 8 double 1 wd 2x1 4050 mapping Weight-based decision (2 cls), 2 to 1 trained mapping --> sddecide
Creating custom algorithms
In order to define a custom algorithm, the user needs to write the algorithm definition function
specifying the initialization, training, and execution steps. The example of a sda_simple algorithm
is shown below.
The algorithm definition functions adopt a unified interface. All the parameters needed for training and execution are defined and stored within the algorithm object.
%
% DESCRIPTION
% SDA_SIMPLE is a two-stage algorithm consisting of a feature extractor
% followed by a classifier
function out=sda_simple(alg,data)
alg=sdalg(mfilename);
% set up all the parameters needed for training
alg.trpar.featex=pca([],0.9); % feature extraction mapping
alg.trpar.clf=qdc; % classifier
out=setstate(alg,'untrained');
return
end
% derive all parameters needed for execution (expar)
% train the feature extractor
alg.expar.w=data*alg.trpar.featex;
% project the data
data=data*alg.expar.w;
% train the classifier
alg.expar.clf=data*alg.trpar.clf;
% set flag: by default return decisions
alg.expar.decision=1;
% define the decision at the default operating point
alg.expar.wd=sddecide(alg.expar.clf);
out=setstate(alg,'trained');
% use execution parameters to provide output on new data
% extract features
data=data*alg.expar.w;
% execute the classifier
out=data*alg.expar.clf;
% decision ------------------------------
if isfield(alg.expar,'decision') & alg.expar.decision==1
out=out*alg.expar.wd;
end
end
return
Next step: The basics: Performing decision
