We now have a good feel for how the ML library in OpenCV works. It is designed so that new algorithms and techniques can be implemented and embedded into the library easily. In time, it is expected that more new algorithms will appear. This section looks briefly at four machine learning routines that have recently been added to OpenCV. Each implements a well-known learning technique, by which we mean that a substantial body of literature exists on each of these methods in books, published papers, and on the Internet. For more detailed information you should consult the literature and also refer to the …/opencv/docs/ref/opencvref_ml.htm manual.
Expectation maximization (EM) is another popular clustering
technique. OpenCV supports EM only with Gaussian mixtures, but the technique itself is
much more general. It involves multiple iterations of taking the most likely (average or
"expected") guess given your current model and then adjusting that model to maximize its
chances of being right. In OpenCV, the EM algorithm is implemented in the CvEM{}
class and simply involves fitting a mixture of
Gaussians to the data. Because the user provides the number of Gaussians to fit, the
algorithm is similar to K-means.
One of the simplest classification techniques is K-nearest
neighbors (KNN), which merely stores all the training data points. When you
want to classify a new point, look up its K nearest points (for
K an integer number) and then label the new point according to
which set contains the majority of its K neighbors. This algorithm is
implemented in the CvKNearest{}
class in OpenCV. The
KNN classification technique can be very effective, but it requires that you store the
entire training set; hence it can use a lot of memory and become quite slow. People often
cluster the training set to reduce its size before using this method. Readers interested
in how dynamically adaptive nearest neighbor type techniques might be used in the brain
(and in machine learning) can see Grossberg [Grossberg87] or a more recent summary of
advances in Carpenter and Grossberg [Carpenter03].
The multilayer perceptron (MLP; also known as back-propagation) is a neural network
that still ranks among the top-performing classifiers, especially for text recognition. It
can be rather slow in training because it uses gradient descent to minimize error by
adjusting weighted connections between the numerical classification nodes within the
layers. In test mode, however, it is quite fast: just a series of dot products followed by
a squashing function. In OpenCV it is implemented in the CvANN_MLP{}
class, and its use is documented in the …/opencv/samples/c/letter_recog.cpp file. Interested readers will find
details on using MLP effectively for text and object recognition in LeCun, Bottou, Bengio,
and Haffner [LeCun98a]. Implementation and tuning details are given in LeCun, Bottou, and
Muller [LeCun98b]. New work on brainlike hierarchical networks that propagate
probabilities can be found in Hinton, Osindero, and Teh [Hinton06].
With lots of data, boosting or random trees are usually the best-performing
classifiers. But when your data set is limited, the support vector
machine (SVM) often works best. This N-class algorithm works by
projecting the data into a higher-dimensional space (creating new dimensions out of
combinations of the features) and then finding the optimal linear separator between the
classes. In the original space of the raw input data, this high-dimensional linear
classifier can become quite nonlinear. Hence we can use linear classification techniques
based on maximal between-class separation to produce nonlinear
classifiers that in some sense optimally separate classes in the data. With enough
additional dimensions, you can almost always perfectly separate data classes. This
technique is implemented in the CvSVM{}
class in
OpenCV's ML library.
These tools are closely tied to many computer vision algorithms that range from finding feature points via trained classification to tracking to segmenting scenes and also include the more straightforward tasks of classifying objects and clustering image data.