Some More Complicated Stuff

Everything we've discussed so far was reasonably basic. Each of the functions provided for a relatively obvious need. Collectively, they form a good foundation for much of what you might want to do with histograms in the context of computer vision (and probably in other contexts as well). At this point we want to look at some more complicated routines available within OpenCV that are extremely useful in certain applications. These routines include a more sophisticated method of comparing two histograms as well as tools for computing and/or visualizing which portions of an image contribute to a given portion of a histogram.

Earth Mover's Distance

Lighting changes can cause shifts in color values (see Figure 7-5), although such shifts tend not to change the shape of the histogram of color values, but shift the color value locations and thus cause the histogram-matching schemes we've learned about to fail. The difficulty with histogram match measures is that they can return a large difference in the case where two histograms are similarly shaped, but only displaced relative to one another. It is often desirable to have a distance measure which performs like a match, except which is less sensitive to such displacements. Earth mover's distance (EMD) [Rubner00] is such a metric; it essentially measures how much work it would take to "shovel" one histogram shape into another, including moving part (or all) of the histogram to a new location. It works in any number of dimensions.

Return again to Figure 7-4; we see the "earthshoveling" nature of EMD's distance measure in the rightmost column. An exact match is a distance of 0. Half a match is half a "shovel full", the amount it would take to spread half of the left histogram into the next slot. Finally, moving the entire histogram one step to the right would require an entire unit of distance (i.e., to change the model histogram into the "totally mismatched" histogram).

The EMD algorithm itself is quite general; it allows users to set their own distance metric or their own cost-of-moving matrix. One can record where the histogram "material" flowed from one histogram to another, and one can employ nonlinear distance metrics derived from prior information about the data. The EMD function in OpenCV is cvCalcEMD2():

float cvCalcEMD2(
    const CvArr*       signature1,
    const CvArr*       signature2,
    int                distance_type,
    CvDistanceFunction distance_func  = NULL,
    const CvArr*       cost_matrix    = NULL,
    CvArr*             flow           = NULL,
    float*             lower_bound    = NULL,
    void*              userdata       = NULL
);

The cvCalcEMD2() function has enough parameters to make one dizzy. This may seem rather complex for such an intuitive function, but the complexity stems from all the subtle configurable dimensions of the algorithm. ^[97] Fortunately, the function can be used in its more basic and intuitive form and without most of the arguments (note all the "=NULL" defaults in the preceding code). Example 7-2 shows the simplified version.

Example 7-2. Simple EMD interface

float cvCalcEMD2(
    const CvArr* signature1,
    const CvArr* signature2,
    int          distance_type
);

The parameter distance_type for the simpler version of cvCalcEMD2() is either Manhattan distance (CV_DIST_L1) or Euclidean distance (CV_DIST_L2). Although we're applying the EMD to histograms, the interface prefers that we talk to it in terms of signatures for the first two array parameters.

These signature arrays are always of type float and consist of rows containing the histogram bin count followed by its coordinates. For the one-dimensional histogram of Figure 7-4, the signatures (listed array rows) for the lefthand column of histograms (skipping the model) would be as follows: top, [1, 0; 0, 1]; middle, [0.5, 0; 0.5, 1]; bottom, [0, 0; 1, 1]. If we had a bin in a three-dimensional histogram with a bin count of 537 at (x, y, z) index (7, 43, 11), then the signature row for that bin would be [537, 7, 43, 11]. This is how we perform the necessary step of converting histograms into signatures.

As an example, suppose we have two histograms, hist1 and hist2, that we want to convert to two signatures, sig1 and sig2. Just to make things more difficult, let's suppose that these are two-dimensional histograms (as in the preceding code examples) of dimension h_bins by s_bins. Example 7-3 shows how to convert these two histograms into two signatures.

Example 7-3. Creating signatures from histograms for EMD

//Convert histograms into signatures for EMD matching
//assume we already have 2D histograms hist1 and hist2
//that are both of dimension h_bins by s_bins (though for EMD,
// histograms don't have to match in size).
//
CvMat* sig1,sig2;
int numrows = h_bins*s_bins;

//Create matrices to store the signature in
//
sig1 = cvCreateMat(numrows, 3, CV_32FC1); //1 count + 2 coords = 3
sig2 = cvCreateMat(numrows, 3, CV_32FC1); //sigs are of type float.

//Fill signatures for the two histograms
//
for( int h = 0; h < h_bins; h++ ) {
    for( int s = 0; s < s_bins; s++ ) {
        float bin_val = cvQueryHistValue_2D( hist1, h, s );
        cvSet2D(sig1,h*s_bins + s,0,cvScalar(bin_val)); //bin value
        cvSet2D(sig1,h*s_bins + s,1,cvScalar(h));       //Coord 1
        cvSet2D(sig1,h*s_bins + s,2,cvScalar(s));       //Coord 2
        bin_val = cvQueryHistValue_2D( hist2, h, s );
        cvSet2D(sig2,h*s_bins + s,0,cvScalar(bin_val)); //bin value
        cvSet2D(sig2,h*s_bins + s,1,cvScalar(h));       //Coord 1
        cvSet2D(sig2,h*s_bins + s,2,cvScalar(s));       //Coord 2
    }
}

Notice in this example ^[98] that the function cvSet2D() takes a CvScalar() array to set its value even though each entry in this particular matrix is a single float. We use the inline convenience macro cvScalar() to accomplish this task. Once we have our histograms converted into signatures, we are ready to get the distance measure. Choosing to measure by Euclidean distance, we now add the code of Example 7-4.

Example 7-4. Using EMD to measure the similarity between distributions

// Do EMD AND REPORT
//
float emd = cvCalcEMD2(sig1,sig2,CV_DIST_L2);
printf("%f; ",emd);

Back Projection

Back projection is a way of recording how well the pixels (for cvCalcBackProject()) or patches of pixels (for cvCalcBackProjectPatch()) fit the distribution of pixels in a histogram model. For example, if we have a histogram of flesh color then we can use back projection to find flesh color areas in an image. The function call for doing this kind of lookup is:

void cvCalcBackProject(
  IplImage**         image,
  CvArr*             back_project,
  const CvHistogram* hist
);

We have already seen the array of single channel images IplImage** image in the function cvCalcHist() (see the section "Basic Manipulations with Histograms"). The number of images in this array is exactly the same—and in the same order—as used to construct the histogram model hist. Example 7-1 showed how to convert an image into single-channel planes and then make an array of them. The image or array back_project is a single-channel 8-bit or floating-point image of the same size as the input images in the array. The values in back_project are set to the values in the associated bin in hist. If the histogram is normalized, then this value can be associated with a conditional probability value (i.e., the probability that a pixel in image is a member of the type characterized by the histogram in hist). ^[99] In Figure 7-6, we use a flesh-color histogram to derive a probability of flesh image.

Back projection of histogram values onto each pixel based on its color: the HSV flesh-color histogram (upper left ) is used to convert the hand image (upper right) into the flesh-color probability image (lower right); the lower left panel is the histogram of the hand image

Figure 7-6. Back projection of histogram values onto each pixel based on its color: the HSV flesh-color histogram (upper left ) is used to convert the hand image (upper right) into the flesh-color probability image (lower right); the lower left panel is the histogram of the hand image

Tip

When back_project is a byte image rather than a float image, you should either not normalize the histogram or else scale it up before use. The reason is that the highest possible value in a normalized histogram is 1, so anything less than that will be rounded down to 0 in the 8-bit image. You might also need to scale back_project in order to see the values with your eyes, depending on how high the values are in your histogram.

Patch-based back projection

We can use the basic back-projection method to model whether or not a particular pixel is likely to be a member of a particular object type (when that object type was modeled by a histogram). This is not exactly the same as computing the probability of the presence of a particular object. An alternative method would be to consider subregions of an image and the feature (e.g., color) histogram of that subregion and to ask whether the histogram of features for the subregion matches the model histogram; we could then associate with each such subregion a probability that the modeled object is, in fact, present in that subregion.

Thus, just as cvCalcBackProject() allows us to compute if a pixel might be part of a known object, cvCalcBackProjectPatch() allows us to compute if a patch might contain a known object. The cvCalcBackProjectPatch() function uses a sliding window over the entire input image, as shown in Figure 7-7. At each location in the input array of images, all the pixels in the patch are used to set one pixel in the destination image corresponding to the center of the patch. This is important because many properties of images such as textures cannot be determined at the level of individual pixels, but instead arise from groups of pixels.

For simplicity in these examples, we've been sampling color to create our histogram models. Thus in Figure 7-6 the whole hand "lights up" because pixels there match the flesh color histogram model well. Using patches, we can detect statistical properties that occur over local regions, such as the variations in local intensity that make up a texture on up to the configuration of properties that make up a whole object. Using local patches, there are two ways one might consider applying cvCalcBackProjectPatch(): as a region detector when the sampling window is smaller than the object and as an object detector when the sampling window is the size of the object. Figure 7-8 shows the use of cvCalcBackProjectPatch() as a region detector. We start with a histogram model of palm-flesh color and a small window is moved over the image such that each pixel in the back projection image records the probability of palm-flesh at that pixel given all the pixels in the surrounding window in the original image. In Figure 7-8 the hand is much larger than the scanning window and the palm region is preferentially detected. Figure 7-9 starts with a histogram model collected from blue mugs. In contrast to Figure 7-8 where regions were detected, Figure 7-9 shows how cvCalcBackProjectPatch() can be used as an object detector. When the window size is roughly the same size as the objects we are hoping to find in an image, the whole object "lights up" in the back projection image. Finding peaks in the back projection image then corresponds to finding the location of objects (in Figure 7-9, a mug) that we are looking for.

Back projection: a sliding patch over the input image planes is used to set the corresponding pixel (at the center of the patch) in the destination image; for normalized histogram models, the resulting image can be interpreted as a probability map indicating the possible presence of the object (this figure is taken from the OpenCV reference manual)

Figure 7-7. Back projection: a sliding patch over the input image planes is used to set the corresponding pixel (at the center of the patch) in the destination image; for normalized histogram models, the resulting image can be interpreted as a probability map indicating the possible presence of the object (this figure is taken from the OpenCV reference manual)

The function provided by OpenCV for back projection by patches is:

void cvCalcBackProjectPatch(
    IplImage**   images,
    CvArr*       dst,
    CvSize       patch_size,
    CvHistogram* hist,
    int          method,
    float        factor
);

Here we have the same array of single-channel images that was used to create the histogram using cvCalcHist(). However, the destination image dst is different: it can only be a single-channel, floating-point image with size (images[0][0].width – patch_size.x + 1, images[0][0].height – patch_size.y + 1). The explanation for this size (see Figure 7-7) is that the center pixel in the patch is used to set the corresponding location in dst, so we lose half a patch dimension along the edges of the image on every side. The parameter patch_size is exactly what you would expect (the size of the patch) and may be set using the convenience macro cvSize(width, height). We are already familiar with the histogram parameter; as with cvCalcBackProject(), this is the model histogram to which individual windows will be compared. The parameter for comparison method takes as arguments exactly the same method types as used in cvCompareHist() (see the "Comparing Two Histograms" section).^[100] The final parameter, factor, is the normalization level; this parameter is the same as discussed previously in connection with cvNormalizeHist(). You can set it to 1 or, as a visualization aid, to some larger number. Because of this flexibility, you are always free to normalize your hist model before using cvCalcBackProjectPatch().

Back projection used for histogram object model of flesh tone where the window (small white box in upper right frame) is much smaller than the hand; here, the histogram model was of palm-color distribution and the peak locations tend to be at the center of the hand

Figure 7-8. Back projection used for histogram object model of flesh tone where the window (small white box in upper right frame) is much smaller than the hand; here, the histogram model was of palm-color distribution and the peak locations tend to be at the center of the hand

A final question comes up: Once we have a probability of object image, how do we use that image to find the object that we are searching for? For search, we can use the cvMinMaxLoc() discussed in Chapter 3. The maximum location (assuming you smooth a bit first) is the most likely location of the object in an image. This leads us to a slight digression, template matching.

Using cvCalcBackProjectPatch() to locate an object (here, a coffee cup) whose size approximately matches the patch size (white box in upper right panel): the sought object is modeled by a hue-saturation histogram (upper left), which can be compared with an HS histogram for the image as a whole (lower left); the result of cvCalcBackProjectPatch() (lower right) is that the object is easily picked out from the scene by virtue of its color

Figure 7-9. Using cvCalcBackProjectPatch() to locate an object (here, a coffee cup) whose size approximately matches the patch size (white box in upper right panel): the sought object is modeled by a hue-saturation histogram (upper left), which can be compared with an HS histogram for the image as a whole (lower left); the result of cvCalcBackProjectPatch() (lower right) is that the object is easily picked out from the scene by virtue of its color

Template Matching

Template matching via cvMatchTemplate() is not based on histograms; rather, the function matches an actual image patch against an input image by "sliding" the patch over the input image using one of the matching methods described in this section.

If, as in Figure 7-10, we have an image patch containing a face, then we can slide that face over an input image looking for strong matches that would indicate another face is present. The function call is similar to that of cvCalcBackProjectPatch():


void cvMatchTemplate(
    const CvArr* image,
    const CvArr* templ,
    CvArr*       result,
    int          method
);

Instead of the array of input image planes that we saw in cvCalcBackProjectPatch(), here we have a single 8-bit or floating-point plane or color image as input. The matching model in templ is just a patch from a similar image containing the object for which you are searching. The output object image will be put in the result image, which is a single-channel byte or floating-point image of size (images->width – patch_size.x + 1, images->height – patch_size.y + 1), as we saw previously in cvCalcBackProjectPatch(). The matching method is somewhat more complex, as we now explain. We use I to denote the input image, T the template, and R the result.

Figure 7-10. cvMatchTemplate() sweeps a template image patch across another image looking for matches

Table 7-2. Values of the method parameter for normalized template matching

Value of method parameter	Computed result
`CV_TM_SQDIFF_NORMED`
`CV_TM_CCORR_NORMED`
`CV_TM_CCOEFF_NORMED`

Value of method parameter

Computed result

CV_TM_SQDIFF_NORMED

CV_TM_CCORR_NORMED

CV_TM_CCOEFF_NORMED

As usual, we obtain more accurate matches (at the cost of more computations) as we move from simpler measures (square difference) to the more sophisticated ones (correlation coefficient). It's best to do some test trials of all these settings and then choose the one that best trades off accuracy for speed in your application.

Tip

Again, be careful when interpreting your results. The square-difference methods show best matches with a minimum, whereas the correlation and correlation-coefficient methods show best matches at maximum points.

As in the case of cvCalcBackProjectPatch(), once we use cvMatchTemplate() to obtain a matching result image we can then use cvMinMaxLoc() to find the location of the best match. Again, we want to ensure there's an area of good match around that point in order to avoid random template alignments that just happen to work well. A good match should have good matches nearby, because slight misalignments of the template shouldn't vary the results too much for real matches. Looking for the best matching "hill" can be done by slightly smoothing the result image before seeking the maximum (for correlation or correlation-coefficient) or minimum (for square-difference matching methods). The morphological operators can also be helpful in this context.

Example 7-5 should give you a good idea of how the different template matching techniques behave. This program first reads in a template and image to be matched and then performs the matching via the methods we've discussed here.

Example 7-5. Template matching

// Template matching.
// Usage: matchTemplate image template
//
#include <cv.h>
#include <cxcore.h>
#include <highgui.h>
#include <stdio.h>
int main( int argc, char** argv ) {
    IplImage *src, *templ,*ftmp[6]; //ftmp will hold results
    int i;
    if( argc == 3){
        //Read in the source image to be searched:
        if((src=cvLoadImage(argv[1], 1))== 0) {
             printf("Error on reading src image %s\n",argv[i]);
             return(-1);
        }
        //Read in the template to be used for matching:
        if((templ=cvLoadImage(argv[2], 1))== 0) {
             printf("Error on reading template %s\n",argv[2]);
                 return(-1);
        }
        //ALLOCATE OUTPUT IMAGES:
        int iwidth = src->width - templ->width + 1;
        int iheight = src->height - templ->height + 1;
        for(i=0; i<6; ++i){
            ftmp[i] = cvCreateImage(
                              cvSize(iwidth,iheight),32,1);
        }
        //DO THE MATCHING OF THE TEMPLATE WITH THE IMAGE:
        for(i=0; i<6; ++i){
            cvMatch Template( src, templ, ftmp[i], i);
            cvNormalize(ftmp[i],ftmp[i],1,0,CV_MINMAX)^[101];
        }
        //DISPLAY
        cvNamedWindow( "Template", 0 );
        cvShowImage(   "Template", templ );
        cvNamedWindow( "Image", 0 );
        cvShowImage(   "Image", src );
        cvNamedWindow( "SQDIFF", 0 );
        cvShowImage(   "SQDIFF", ftmp[0] );
        cvNamedWindow( "SQDIFF_NORMED", 0 );
        cvShowImage(   "SQDIFF_NORMED", ftmp[1] );
        cvNamedWindow( "CCORR", 0 );
        cvShowImage(   "CCORR", ftmp[2] );
        cvNamedWindow( "CCORR_NORMED", 0 );
        cvShowImage(   "CCORR_NORMED", ftmp[3] );
        cvNamedWindow( "CCOEFF", 0 );
        cvShowImage(   "CCOEFF", ftmp[4] );
        cvNamedWindow( "CCOEFF_NORMED", 0 );
        cvShowImage(   "CCOEFF_NORMED", ftmp[5] );
        //LET USER VIEW RESULTS:
        cvWaitKey(0);
   }
   else { printf("Call should be: "
              "matchTemplate image template \n");}
}

Note the use of cvNormalize() in this code, which allows us to display the results in a consistent way. Recall that some of the matching methods can return negative-valued results. We use the CV_MINMAX flag when normalizing; this tells the function to shift and scale the floating-point images so that all returned values are between 0 and 1. Figure 7-11 shows the results of sweeping the face template over the source image (shown in Figure 7-10) using each of cvMatchTemplate()'s available matching methods. In outdoor imagery especially, it's almost always better to use one of the normalized methods. Among those, correlation coefficient gives the most clearly delineated match—but, as expected, at a greater computational cost. For a specific application, such as automatic parts inspection or tracking features in a video, you should try all the methods and find the speed and accuracy trade-off that best serves your needs.

Match results of six matching methods for the template search depicted in : the best match for square difference is 0 and for the other methods it's the maximum point; thus, matches are indicated by dark areas in the left column and by bright spots in the other two columns

Figure 7-11. Match results of six matching methods for the template search depicted in Figure 7-10: the best match for square difference is 0 and for the other methods it's the maximum point; thus, matches are indicated by dark areas in the left column and by bright spots in the other two columns

^[97]If you want all of the gory details, we recommend that you read the 1989 paper by S. Peleg, M. Werman, and H. Rom, "A Unified Approach to the Change of Resolution: Space and Gray-Level," and then take a look at the relevant entries in the OpenCV user manual that are included in the release …\opencv\docs\ref\opencvref_cv.htm.

^[98] Using cvSetReal2D() or cvmSet() would have been more compact and efficient here, but the example is clearer this way and the extra overhead is small compared to the actual distance calculation in EMD.

^[99] Specifically, in the case of our flesh-tone H-S histogram, if C is the color of the pixel and F is the probability that a pixel is flesh, then this probability map gives us p(C|F), the probability of drawing that color if the pixel actually is flesh. This is not quite the same as p(F|C), the probability that the pixel is flesh given its color. However, these two probabilities are related by Bayes' theorem [Bayes1763] and so, if we know the overall probability of encountering a flesh-colored object in a scene as well as the total probability of encountering of the range of flesh colors, then we can compute p(F|C) from p(C|F). Specifically, Bayes' theorem establishes the following relation:

^[100] You must be careful when choosing a method, because some indicate best match with a return value of 1 and others with a value of 0.

^[101]You can often get more pronounced match results by raising the matches to a power (e.g., cvPow(ftmp[i], ftmp[i], 5); ). In the case of a result which is normalized between 0.0 and 1.0, then you can immediately see that a good match of 0.99 taken to the fifth power is not much reduced (0.99⁵=0.95) while a poorer score of 0.50 is reduced substantially (0.50⁵=0.03).