Chapter 7. Histograms and Matching

In the course of analyzing images, objects, and video information, we frequently want to represent what we are looking at as a histogram. Histograms can be used to represent such diverse things as the color distribution of an object, an edge gradient template of an object [Freeman95], and the distribution of probabilities representing our current hypothesis about an object's location. Figure 7-1 shows the use of histograms for rapid gesture recognition. Edge gradients were collected from "up", "right", "left", "stop" and "OK" hand gestures. A webcam was then set up to watch a person who used these gestures to control web videos. In each frame, color interest regions were detected from the incoming video; then edge gradient directions were computed around these interest regions, and these directions were collected into orientation bins within a histogram. The histograms were then matched against the gesture models to recognize the gesture. The vertical bars in Figure 7-1 show the match levels of the different gestures. The gray horizontal line represents the threshold for acceptance of the "winning" vertical bar corresponding to a gesture model.

Histograms find uses in many computer vision applications. Histograms are used to detect scene transitions in videos by marking when the edge and color statistics markedly change from frame to frame. They are used to identify interest points in images by assigning each interest point a "tag" consisting of histograms of nearby features. Histograms of edges, colors, corners, and so on form a general feature type that is passed to classifiers for object recognition. Sequences of color or edge histograms are used to identify whether videos have been copied on the web, and the list goes on. Histograms are one of the classic tools of computer vision.

Histograms are simply collected counts of the underlying data organized into a set of predefined bins. They can be populated by counts of features computed from the data, such as gradient magnitudes and directions, color, or just about any other characteristic. In any case, they are used to obtain a statistical picture of the underlying distribution of data. The histogram usually has fewer dimensions than the source data. Figure 7-2 depicts a typical situation. The figure shows a two-dimensional distribution of points (upper left); we impose a grid (upper right) and count the data points in each grid cell, yielding a one-dimensional histogram (lower right). Because the raw data points can represent just about anything, the histogram is a handy way of representing whatever it is that you have learned from your image.

Local histograms of gradient orientations are used to find the hand and its gesture; here the "winning" gesture (longest vertical bar) is a correct recognition of "L" (move left)

Figure 7-1. Local histograms of gradient orientations are used to find the hand and its gesture; here the "winning" gesture (longest vertical bar) is a correct recognition of "L" (move left)

Histograms that represent continuous distributions do so by implicitly averaging the number of points in each grid cell. ^[90] This is where problems can arise, as shown in Figure 7-3. If the grid is too wide (upper left), then there is too much averaging and we lose the structure of the distribution. If the grid is too narrow (upper right), then there is not enough averaging to represent the distribution accurately and we get small, "spiky" cells.

OpenCV has a data type for representing histograms. The histogram data structure is capable of representing histograms in one or many dimensions, and it contains all the data necessary to track bins of both uniform and nonuniform sizes. And, as you might expect, it comes equipped with a variety of useful functions which will allow us to easily perform common operations on our histograms.

Typical histogram example: starting with a cloud of points (upper left), a counting grid is imposed (upper right) that yields a one-dimensional histogram of point counts (lower right)

Figure 7-2. Typical histogram example: starting with a cloud of points (upper left), a counting grid is imposed (upper right) that yields a one-dimensional histogram of point counts (lower right)

Basic Histogram Data Structure

Let's start out by looking directly at the CvHistogram data structure.

typedef struct CvHistogram
{
    int     type;
    CvArr*  bins;
    float   thresh[CV_MAX_DIM][2]; // for uniform histograms
    float** thresh2;                // for nonuniform histograms
    CvMatND mat;                    // embedded matrix header
                                    // for array histograms
}
CvHistogram;

This definition is deceptively simple, because much of the internal data of the histogram is stored inside of the CvMatND structure. We create new histograms with the following routine:

CvHistogram* cvCreateHist(
    int     dims,
    int*    sizes,
    int     type,
    float** ranges = NULL,
    int     uniform = 1
);

A histogram's accuracy depends on its grid size: a grid that is too wide yields too much spatial averaging in the histogram counts (left); a grid that is too small yields "spiky" and singleton results from too little averaging (right)

Figure 7-3. A histogram's accuracy depends on its grid size: a grid that is too wide yields too much spatial averaging in the histogram counts (left); a grid that is too small yields "spiky" and singleton results from too little averaging (right)

The argument dims indicates how many dimensions we want the histogram to have. The sizes argument must be an array of integers whose length is equal to dims. Each integer in this array indicates how many bins are to be assigned to the corresponding dimension. The type can be either CV_HIST_ARRAY, which is used for multidimensional histograms to be stored using the dense multidimensional matrix structure (i.e., CvMatND), or CV_HIST_SPARSE^[91] if the data is to be stored using the sparse matrix representation (CvSparseMat). The argument ranges can have one of two forms. For a uniform histogram, ranges is an array of floating-point value pairs, ^[92] where the number of value pairs is equal to the number of dimensions. For a nonuniform histogram, the pairs used by the uniform histogram are replaced by arrays containing the values by which the nonuniform bins are separated. If there are N bins, then there will be N + 1 entries in each of these subarrays. Each array of values starts with the bottom edge of the lowest bin and ends with the top edge of the highest bin. ^[93] The Boolean argument uniform indicates if the histogram is to have uniform bins and thus how the ranges value is interpreted;^[94] if set to a nonzero value, the bins are uniform. It is possible to set ranges to NULL, in which case the ranges are simply "unknown" (they can be set later using the specialized function cvSetHistBinRanges()). Clearly, you had better set the value of ranges before you start using the histogram.

void cvSetHistBinRanges(
    CvHistogram* hist,
    float**      ranges,
    int          uniform = 1
);

The arguments to cvSetHistRanges() are exactly the same as the corresponding arguments for cvCreateHist(). Once you are done with a histogram, you can clear it (i.e., reset all of the bins to 0) if you plan to reuse it or you can de-allocate it with the usual release-type function.

void cvClearHist(
  CvHistogram* hist
);
void cvReleaseHist(
  CvHistogram** hist
);

As usual, the release function is called with a pointer to the histogram pointer you obtained from the create function. The histogram pointer is set to NULL once the histogram is de-allocated.

Another useful function helps create a histogram from data we already have lying around:

CvHistogram*  cvMakeHistHeaderForArray(
    int          dims,
    int*         sizes,
    CvHistogram* hist,
    float*       data,
    float**      ranges = NULL,
    int          uniform = 1
);

In this case, hist is a pointer to a CvHistogram data structure and data is a pointer to an area of size sizes[0]*sizes[1]*…*sizes[dims-1] for storing the histogram bins. Notice that data is a pointer to float because the internal data representation for the histogram is always of type float. The return value is just the same as the hist value we passed in. Unlike the cvCreateHist() routine, there is no type argument. All histograms created by cvMakeHistHeaderForArray() are dense histograms. One last point before we move on: since you (presumably) allocated the data storage area for the histogram bins yourself, there is no reason to call cvReleaseHist() on your CvHistogram structure. You will have to clean up the header structure (if you did not allocate it on the stack) and, of course, clean up your data as well; but since these are "your" variables, you are assumed to be taking care of this in your own way.

^[90] This is also true of histograms representing information that falls naturally into discrete groups when the histogram uses fewer bins than the natural description would suggest or require. An example of this is representing 8-bit intensity values in a 10-bin histogram: each bin would then combine the points associated with approximately 25 different intensities, (erroneously) treating them all as equivalent.

^[91]For you old timers, the value CV_HIST_TREE is still supported, but it is identical to CV_HIST_SPARSE.

^[92]These "pairs" are just C-arrays with only two entries.

^[93]To clarify: in the case of a uniform histogram, if the lower and upper ranges are set to 0 and 10, respectively, and if there are two bins, then the bins will be assigned to the respective intervals [0, 5) and [5, 10]. In the case of a nonuniform histogram, if the size dimension i is 4 and if the corresponding ranges are set to (0, 2, 4, 9, 10), then the resulting bins will be assigned to the following (nonuniform) intervals: [0, 2), [2,4), [4, 9), and [9, 10].

^[94]Have no fear that this argument is type int, because the only meaningful distinction is between zero and nonzero.