Everything we've discussed so far was reasonably basic. Each of the functions provided for a relatively obvious need. Collectively, they form a good foundation for much of what you might want to do with histograms in the context of computer vision (and probably in other contexts as well). At this point we want to look at some more complicated routines available within OpenCV that are extremely useful in certain applications. These routines include a more sophisticated method of comparing two histograms as well as tools for computing and/or visualizing which portions of an image contribute to a given portion of a histogram.
Lighting changes can cause shifts in color values (see Figure 7-5), although such shifts tend not to change the shape of the histogram of color values, but shift the color value locations and thus cause the histogram-matching schemes we've learned about to fail. The difficulty with histogram match measures is that they can return a large difference in the case where two histograms are similarly shaped, but only displaced relative to one another. It is often desirable to have a distance measure which performs like a match, except which is less sensitive to such displacements. Earth mover's distance (EMD) [Rubner00] is such a metric; it essentially measures how much work it would take to "shovel" one histogram shape into another, including moving part (or all) of the histogram to a new location. It works in any number of dimensions.
Return again to Figure 7-4; we see the "earthshoveling" nature of EMD's distance measure in the rightmost column. An exact match is a distance of 0. Half a match is half a "shovel full", the amount it would take to spread half of the left histogram into the next slot. Finally, moving the entire histogram one step to the right would require an entire unit of distance (i.e., to change the model histogram into the "totally mismatched" histogram).
The EMD algorithm itself is quite general; it allows users to set their own distance
metric or their own cost-of-moving matrix. One can record where the histogram "material"
flowed from one histogram to another, and one can employ nonlinear distance metrics
derived from prior information about the data. The EMD function in OpenCV is cvCalcEMD2()
:
float cvCalcEMD2( const CvArr* signature1, const CvArr* signature2, int distance_type, CvDistanceFunction distance_func = NULL, const CvArr* cost_matrix = NULL, CvArr* flow = NULL, float* lower_bound = NULL, void* userdata = NULL );
The cvCalcEMD2()
function has enough parameters to
make one dizzy. This may seem rather complex for such an intuitive function, but the
complexity stems from all the subtle configurable dimensions of the algorithm. [97] Fortunately, the function can be used in its more basic and intuitive form and
without most of the arguments (note all the "=NULL
"
defaults in the preceding code). Example 7-2 shows the
simplified version.
Example 7-2. Simple EMD interface
float cvCalcEMD2( const CvArr* signature1, const CvArr* signature2, int distance_type );
The parameter distance_type
for the simpler version
of cvCalcEMD2()
is either Manhattan distance (CV_DIST_L1
) or
Euclidean distance (CV_DIST_L2
). Although we're applying the EMD to histograms, the interface prefers that we talk to it in terms of signatures for the first two array parameters.
These signature arrays are always of type float
and
consist of rows containing the histogram bin count followed by its coordinates. For the
one-dimensional histogram of Figure 7-4, the signatures
(listed array rows) for the lefthand column of histograms (skipping the model) would be as
follows: top, [1, 0; 0, 1]; middle, [0.5, 0; 0.5, 1]; bottom, [0, 0; 1, 1]. If we had a
bin in a three-dimensional histogram with a bin count of 537 at (x, y,
z) index (7, 43, 11), then the signature row for that bin would be [537, 7,
43, 11]. This is how we perform the necessary step of converting histograms into signatures.
As an example, suppose we have two histograms, hist1
and hist2
, that we want to convert
to two signatures, sig1
and sig2
. Just to make things more difficult, let's suppose that these are
two-dimensional histograms (as in the preceding code examples) of dimension h_bins
by s_bins
. Example 7-3 shows how to convert these two
histograms into two signatures.
Example 7-3. Creating signatures from histograms for EMD
//Convert histograms into signatures for EMD matching //assume we already have 2D histograms hist1 and hist2 //that are both of dimension h_bins by s_bins (though for EMD, // histograms don't have to match in size). // CvMat* sig1,sig2; int numrows = h_bins*s_bins; //Create matrices to store the signature in // sig1 = cvCreateMat(numrows, 3, CV_32FC1); //1 count + 2 coords = 3 sig2 = cvCreateMat(numrows, 3, CV_32FC1); //sigs are of type float. //Fill signatures for the two histograms // for( int h = 0; h < h_bins; h++ ) { for( int s = 0; s < s_bins; s++ ) { float bin_val = cvQueryHistValue_2D( hist1, h, s ); cvSet2D(sig1,h*s_bins + s,0,cvScalar(bin_val)); //bin value cvSet2D(sig1,h*s_bins + s,1,cvScalar(h)); //Coord 1 cvSet2D(sig1,h*s_bins + s,2,cvScalar(s)); //Coord 2 bin_val = cvQueryHistValue_2D( hist2, h, s ); cvSet2D(sig2,h*s_bins + s,0,cvScalar(bin_val)); //bin value cvSet2D(sig2,h*s_bins + s,1,cvScalar(h)); //Coord 1 cvSet2D(sig2,h*s_bins + s,2,cvScalar(s)); //Coord 2 } }
Notice in this example [98] that the function cvSet2D()
takes a
CvScalar()
array to set its value even though each
entry in this particular matrix is a single float. We use the inline convenience macro
cvScalar()
to accomplish this task. Once we have our
histograms converted into signatures, we are ready to get the distance
measure. Choosing to measure by Euclidean distance, we now add the code of Example 7-4.
Back projection is a way of recording how well the pixels (for cvCalcBackProject()
) or patches of pixels (for cvCalcBackProjectPatch()
) fit the distribution of pixels in a
histogram model. For example, if we have a histogram of flesh color then we can use back projection to find flesh color areas in an
image. The function call for doing this kind of lookup is:
void cvCalcBackProject( IplImage** image, CvArr* back_project, const CvHistogram* hist );
We have already seen the array of single channel images IplImage** image
in the function cvCalcHist()
(see the section "Basic Manipulations with Histograms"). The
number of images in this array is exactly the same—and in the same order—as used to
construct the histogram model hist
. Example 7-1 showed how to convert an image into
single-channel planes and then make an array of them. The image or array back_project
is a single-channel 8-bit or floating-point image
of the same size as the input images in the array. The values in back_project
are set to the values in the associated bin in hist
. If the histogram is normalized, then this value can be
associated with a conditional probability value (i.e., the probability that a pixel in
image
is a member of the type characterized by the
histogram in hist
). [99] In Figure 7-6, we use a
flesh-color histogram to derive a probability of flesh image.
Figure 7-6. Back projection of histogram values onto each pixel based on its color: the HSV flesh-color histogram (upper left ) is used to convert the hand image (upper right) into the flesh-color probability image (lower right); the lower left panel is the histogram of the hand image
When back_project
is a byte image rather than a
float image, you should either not normalize the histogram or else scale it up before
use. The reason is that the highest possible value in a normalized histogram is 1, so
anything less than that will be rounded down to 0 in the 8-bit image. You might also
need to scale back_project
in order to see the values
with your eyes, depending on how high the values are in your histogram.
We can use the basic back-projection method to model whether or not a particular pixel is likely to be a member of a particular object type (when that object type was modeled by a histogram). This is not exactly the same as computing the probability of the presence of a particular object. An alternative method would be to consider subregions of an image and the feature (e.g., color) histogram of that subregion and to ask whether the histogram of features for the subregion matches the model histogram; we could then associate with each such subregion a probability that the modeled object is, in fact, present in that subregion.
Thus, just as cvCalcBackProject()
allows us to
compute if a pixel might be part of a known object, cvCalcBackProjectPatch()
allows us to compute if a patch might contain a
known object. The cvCalcBackProjectPatch()
function
uses a sliding window over the entire input image, as shown in Figure 7-7. At each location in the input
array of images, all the pixels in the patch are used to set one pixel in the
destination image corresponding to the center of the patch. This is important because
many properties of images such as textures cannot be determined at the level of
individual pixels, but instead arise from groups of pixels.
For simplicity in these examples, we've been sampling color to create our histogram
models. Thus in Figure 7-6 the whole
hand "lights up" because pixels there match the flesh color histogram model well. Using patches, we can detect statistical
properties that occur over local regions, such as the variations in local intensity that
make up a texture on up to the configuration of properties that make up a whole object.
Using local patches, there are two ways one might consider applying cvCalcBackProjectPatch()
: as a region detector when the
sampling window is smaller than the object and as an object detector when the sampling
window is the size of the object. Figure 7-8 shows the use of cvCalcBackProjectPatch()
as a region detector. We start with
a histogram model of palm-flesh color and a small window is moved over the image such
that each pixel in the back projection image records the probability of palm-flesh at
that pixel given all the pixels in the surrounding window in the original image. In
Figure 7-8 the hand is much larger
than the scanning window and the palm region is preferentially detected. Figure 7-9 starts with a histogram model
collected from blue mugs. In contrast to Figure 7-8 where regions were detected,
Figure 7-9 shows how cvCalcBackProjectPatch()
can be used as an object detector.
When the window size is roughly the same size as the objects we are hoping to find in an
image, the whole object "lights up" in the back projection image. Finding peaks in the
back projection image then corresponds to finding the location of objects (in Figure 7-9, a mug) that we are looking
for.
Figure 7-7. Back projection: a sliding patch over the input image planes is used to set the corresponding pixel (at the center of the patch) in the destination image; for normalized histogram models, the resulting image can be interpreted as a probability map indicating the possible presence of the object (this figure is taken from the OpenCV reference manual)
The function provided by OpenCV for back projection by patches is:
void cvCalcBackProjectPatch( IplImage** images, CvArr* dst, CvSize patch_size, CvHistogram* hist, int method, float factor );
Here we have the same array of single-channel images that was used to create the
histogram using cvCalcHist()
. However, the
destination image dst
is different: it can only be a
single-channel, floating-point image with size (images[0][0].width – patch_size.x + 1, images[0][0].height – patch_size.y +
1)
. The explanation for this size (see Figure 7-7) is that the center pixel in the
patch is used to set the corresponding location in dst
, so we lose half a patch dimension along the edges of the image on
every side. The parameter patch_size
is exactly what
you would expect (the size of the patch) and may be set using the convenience macro
cvSize(width, height)
. We are already familiar with
the histogram parameter; as with cvCalcBackProject()
,
this is the model histogram to which individual windows will be compared. The parameter
for comparison method takes as arguments exactly the same method types as used in
cvCompareHist()
(see the "Comparing Two Histograms" section).[100] The final parameter, factor
, is the
normalization level; this parameter is the same as discussed previously in connection
with cvNormalizeHist()
. You can set it to 1 or, as a
visualization aid, to some larger number. Because of this flexibility, you are always
free to normalize your hist
model before using
cvCalcBackProjectPatch()
.
Figure 7-8. Back projection used for histogram object model of flesh tone where the window (small white box in upper right frame) is much smaller than the hand; here, the histogram model was of palm-color distribution and the peak locations tend to be at the center of the hand
A final question comes up: Once we have a probability of object image, how do we use
that image to find the object that we are searching for? For search, we can use the
cvMinMaxLoc()
discussed in Chapter 3. The maximum location (assuming you smooth a bit
first) is the most likely location of the object in an image. This leads us to a slight
digression, template matching.
Figure 7-9. Using cvCalcBackProjectPatch() to locate an object (here, a coffee cup) whose size approximately matches the patch size (white box in upper right panel): the sought object is modeled by a hue-saturation histogram (upper left), which can be compared with an HS histogram for the image as a whole (lower left); the result of cvCalcBackProjectPatch() (lower right) is that the object is easily picked out from the scene by virtue of its color
Template matching via cvMatchTemplate()
is
not based on histograms; rather, the function matches an actual image patch against an
input image by "sliding" the patch over the input image using one of the matching methods
described in this section.
If, as in Figure 7-10, we have an image
patch containing a face, then we can slide that face over an input image looking for
strong matches that would indicate another face is present. The function call is similar
to that of cvCalcBackProjectPatch()
:
void cvMatchTemplate( const CvArr* image, const CvArr* templ, CvArr* result, int method );
Instead of the array of input image planes that we saw in cvCalcBackProjectPatch()
, here we have a single 8-bit or floating-point plane
or color image
as input. The matching model in templ
is just a patch from a similar image containing the
object for which you are searching. The output object image will be put in the
result
image, which is a single-channel byte or
floating-point image of size (images->width – patch_size.x +
1, images->height – patch_size.y + 1
), as we saw previously in cvCalcBackProjectPatch()
. The matching method
is somewhat more complex, as we now explain. We use
I to denote the input image, T the template, and R
the result.
Figure 7-10. cvMatchTemplate() sweeps a template image patch across another image looking for matches
These methods match the squared difference, so a perfect match will be 0 and bad matches will be large:
These methods multiplicatively match the template against the image, so a perfect match will be large and bad matches will be small or 0.
These methods match a template relative to its mean against the image relative to its mean, so a perfect match will be 1 and a perfect mismatch will be –1; a value of 0 simply means that there is no correlation (random alignments).
For each of the three methods just described, there are also normalized versions first developed by Galton [Galton] as described by Rodgers [Rodgers88]. The normalized methods are useful because, as mentioned previously, they can help reduce the effects of lighting differences between the template and the image. In each case, the normalization coefficient is the same:
The values for method
that give the normalized
computations are listed in Table 7-2.
As usual, we obtain more accurate matches (at the cost of more computations) as we move from simpler measures (square difference) to the more sophisticated ones (correlation coefficient). It's best to do some test trials of all these settings and then choose the one that best trades off accuracy for speed in your application.
Again, be careful when interpreting your results. The square-difference methods show best matches with a minimum, whereas the correlation and correlation-coefficient methods show best matches at maximum points.
As in the case of cvCalcBackProjectPatch()
, once
we use cvMatchTemplate()
to obtain a matching
result
image we can then use cvMinMaxLoc()
to find the location of the best match. Again,
we want to ensure there's an area of good match around that point in order to avoid
random template alignments that just happen to work well. A good match should have good
matches nearby, because slight misalignments of the template shouldn't vary the results
too much for real matches. Looking for the best matching "hill" can be done by slightly
smoothing the result
image before seeking the maximum
(for correlation or correlation-coefficient) or minimum (for square-difference matching
methods). The morphological operators can also be helpful in this context.
Example 7-5 should give you a good idea of how the different template matching techniques behave. This program first reads in a template and image to be matched and then performs the matching via the methods we've discussed here.
Example 7-5. Template matching
// Template matching. // Usage: matchTemplate image template // #include <cv.h> #include <cxcore.h> #include <highgui.h> #include <stdio.h> int main( int argc, char** argv ) { IplImage *src, *templ,*ftmp[6]; //ftmp will hold results int i; if( argc == 3){ //Read in the source image to be searched: if((src=cvLoadImage(argv[1], 1))== 0) { printf("Error on reading src image %s\n",argv[i]); return(-1); } //Read in the template to be used for matching: if((templ=cvLoadImage(argv[2], 1))== 0) { printf("Error on reading template %s\n",argv[2]); return(-1); } //ALLOCATE OUTPUT IMAGES: int iwidth = src->width - templ->width + 1; int iheight = src->height - templ->height + 1; for(i=0; i<6; ++i){ ftmp[i] = cvCreateImage( cvSize(iwidth,iheight),32,1); } //DO THE MATCHING OF THE TEMPLATE WITH THE IMAGE: for(i=0; i<6; ++i){ cvMatch Template( src, templ, ftmp[i], i); cvNormalize(ftmp[i],ftmp[i],1,0,CV_MINMAX)[101]; } //DISPLAY cvNamedWindow( "Template", 0 ); cvShowImage( "Template", templ ); cvNamedWindow( "Image", 0 ); cvShowImage( "Image", src ); cvNamedWindow( "SQDIFF", 0 ); cvShowImage( "SQDIFF", ftmp[0] ); cvNamedWindow( "SQDIFF_NORMED", 0 ); cvShowImage( "SQDIFF_NORMED", ftmp[1] ); cvNamedWindow( "CCORR", 0 ); cvShowImage( "CCORR", ftmp[2] ); cvNamedWindow( "CCORR_NORMED", 0 ); cvShowImage( "CCORR_NORMED", ftmp[3] ); cvNamedWindow( "CCOEFF", 0 ); cvShowImage( "CCOEFF", ftmp[4] ); cvNamedWindow( "CCOEFF_NORMED", 0 ); cvShowImage( "CCOEFF_NORMED", ftmp[5] ); //LET USER VIEW RESULTS: cvWaitKey(0); } else { printf("Call should be: " "matchTemplate image template \n");} }
Note the use of cvNormalize()
in this code, which
allows us to display the results in a consistent way. Recall that some of the matching
methods can return negative-valued results. We use the CV_MINMAX
flag when normalizing; this tells the function to shift and scale
the floating-point images so that all returned values are between 0 and 1. Figure 7-11 shows the results of sweeping the
face template over the source image (shown in Figure 7-10) using each of cvMatchTemplate()
's available matching methods. In outdoor
imagery especially, it's almost always better to use one of the normalized methods.
Among those, correlation coefficient gives the most clearly delineated match—but, as
expected, at a greater computational cost. For a specific application, such as automatic
parts inspection or tracking features in a video, you should try all the methods
and find the speed and accuracy trade-off that best serves your needs.
Figure 7-11. Match results of six matching methods for the template search depicted in Figure 7-10: the best match for square difference is 0 and for the other methods it's the maximum point; thus, matches are indicated by dark areas in the left column and by bright spots in the other two columns
[97] If you want all of the gory details, we recommend that you read the 1989 paper by S. Peleg, M. Werman, and H. Rom, "A Unified Approach to the Change of Resolution: Space and Gray-Level," and then take a look at the relevant entries in the OpenCV user manual that are included in the release …\opencv\docs\ref\opencvref_cv.htm.
[98] Using cvSetReal2D()
or cvmSet()
would have been more compact and efficient here, but the example
is clearer this way and the extra overhead is small compared to the actual distance
calculation in EMD.
[99] Specifically, in the case of our flesh-tone H-S histogram, if
C is the color of the pixel and F is the
probability that a pixel is flesh, then this probability map gives us
p(C|F), the probability of drawing that color if the pixel
actually is flesh. This is not quite the same as p(F|C), the
probability that the pixel is flesh given its color. However, these two probabilities
are related by Bayes' theorem [Bayes1763] and so, if we know the overall probability
of encountering a flesh-colored object in a scene as well as the total probability of
encountering of the range of flesh colors, then we can compute
p(F|C) from p(C|F). Specifically, Bayes'
theorem establishes the following relation:
[100] You must be careful when choosing a method, because some indicate best match with a return value of 1 and others with a value of 0.
[101] You can often get more pronounced match results by raising the matches to a power (e.g., cvPow(ftmp[i], ftmp[i], 5);
). In the case of a result which is normalized between 0.0 and 1.0, then you can immediately see that a good match of 0.99 taken to the fifth power is not much reduced (0.995=0.95) while a poorer score of 0.50 is reduced substantially (0.505=0.03).