In the course of analyzing images, objects, and video information, we frequently want to represent what we are looking at as a histogram. Histograms can be used to represent such diverse things as the color distribution of an object, an edge gradient template of an object [Freeman95], and the distribution of probabilities representing our current hypothesis about an object's location. Figure 7-1 shows the use of histograms for rapid gesture recognition. Edge gradients were collected from "up", "right", "left", "stop" and "OK" hand gestures. A webcam was then set up to watch a person who used these gestures to control web videos. In each frame, color interest regions were detected from the incoming video; then edge gradient directions were computed around these interest regions, and these directions were collected into orientation bins within a histogram. The histograms were then matched against the gesture models to recognize the gesture. The vertical bars in Figure 7-1 show the match levels of the different gestures. The gray horizontal line represents the threshold for acceptance of the "winning" vertical bar corresponding to a gesture model.
Histograms find uses in many computer vision applications. Histograms are used to detect scene transitions in videos by marking when the edge and color statistics markedly change from frame to frame. They are used to identify interest points in images by assigning each interest point a "tag" consisting of histograms of nearby features. Histograms of edges, colors, corners, and so on form a general feature type that is passed to classifiers for object recognition. Sequences of color or edge histograms are used to identify whether videos have been copied on the web, and the list goes on. Histograms are one of the classic tools of computer vision.
Histograms are simply collected counts of the underlying data organized into a set of predefined bins. They can be populated by counts of features computed from the data, such as gradient magnitudes and directions, color, or just about any other characteristic. In any case, they are used to obtain a statistical picture of the underlying distribution of data. The histogram usually has fewer dimensions than the source data. Figure 7-2 depicts a typical situation. The figure shows a two-dimensional distribution of points (upper left); we impose a grid (upper right) and count the data points in each grid cell, yielding a one-dimensional histogram (lower right). Because the raw data points can represent just about anything, the histogram is a handy way of representing whatever it is that you have learned from your image.
Figure 7-1. Local histograms of gradient orientations are used to find the hand and its gesture; here the "winning" gesture (longest vertical bar) is a correct recognition of "L" (move left)
Histograms that represent continuous distributions do so by implicitly averaging the number of points in each grid cell. [90] This is where problems can arise, as shown in Figure 7-3. If the grid is too wide (upper left), then there is too much averaging and we lose the structure of the distribution. If the grid is too narrow (upper right), then there is not enough averaging to represent the distribution accurately and we get small, "spiky" cells.
OpenCV has a data type for representing histograms. The histogram data structure is capable of representing histograms in one or many dimensions, and it contains all the data necessary to track bins of both uniform and nonuniform sizes. And, as you might expect, it comes equipped with a variety of useful functions which will allow us to easily perform common operations on our histograms.
Figure 7-2. Typical histogram example: starting with a cloud of points (upper left), a counting grid is imposed (upper right) that yields a one-dimensional histogram of point counts (lower right)
Let's start out by looking directly at the CvHistogram
data structure.
typedef struct CvHistogram { int type; CvArr* bins; float thresh[CV_MAX_DIM][2]; // for uniform histograms float** thresh2; // for nonuniform histograms CvMatND mat; // embedded matrix header // for array histograms } CvHistogram;
This definition is deceptively simple, because much of the internal data of the
histogram is stored inside of the CvMatND
structure. We
create new histograms with the following routine:
CvHistogram* cvCreateHist( int dims, int* sizes, int type, float** ranges = NULL, int uniform = 1 );
Figure 7-3. A histogram's accuracy depends on its grid size: a grid that is too wide yields too much spatial averaging in the histogram counts (left); a grid that is too small yields "spiky" and singleton results from too little averaging (right)
The argument dims
indicates how many dimensions we
want the histogram to have. The sizes
argument must be an
array of integers whose length is equal to dims
. Each
integer in this array indicates how many bins are to be assigned to the corresponding
dimension. The type
can be either CV_HIST_ARRAY
, which is used for multidimensional histograms to be stored using the dense multidimensional matrix structure (i.e.,
CvMatND
), or CV_HIST_SPARSE
[91] if the data is to be stored using the sparse matrix representation (CvSparseMat
). The argument ranges
can have one of two forms. For a uniform histogram, ranges
is an array of floating-point value pairs, [92] where the number of value pairs is equal to the number of dimensions. For a
nonuniform histogram, the pairs used by the uniform histogram are replaced by arrays
containing the values by which the nonuniform bins are separated. If there are N bins, then there will be N + 1 entries in each of these subarrays. Each array of values starts with the
bottom edge of the lowest bin and ends with the top edge of the highest bin. [93] The Boolean argument uniform
indicates if the
histogram is to have uniform bins and thus how the ranges
value is interpreted;[94] if set to a nonzero value, the bins are uniform. It is possible to set ranges
to NULL
, in which case
the ranges are simply "unknown" (they can be set later using the specialized function
cvSetHistBinRanges()
). Clearly, you had better set the
value of ranges
before you start using the
histogram.
void cvSetHistBinRanges( CvHistogram* hist, float** ranges, int uniform = 1 );
The arguments to cvSetHistRanges()
are exactly the
same as the corresponding arguments for cvCreateHist()
.
Once you are done with a histogram, you can clear it
(i.e., reset all of the bins to 0) if you plan to reuse it or you can de-allocate it with
the usual release-type function.
void cvClearHist( CvHistogram* hist ); void cvReleaseHist( CvHistogram** hist );
As usual, the release function is called with a pointer to the histogram pointer you
obtained from the create function. The histogram pointer is set to NULL
once the histogram is de-allocated.
Another useful function helps create a histogram from data we already have lying around:
CvHistogram* cvMakeHistHeaderForArray( int dims, int* sizes, CvHistogram* hist, float* data, float** ranges = NULL, int uniform = 1 );
In this case, hist
is a pointer to a CvHistogram
data structure and data
is a pointer to an area of size sizes[0]*sizes[1]*…*sizes[dims-1]
for storing the histogram bins. Notice that
data
is a pointer to float
because the internal data representation for the histogram is always of
type float
. The return value is just the same as the
hist
value we passed in. Unlike the cvCreateHist()
routine, there is no type
argument. All histograms created by cvMakeHistHeaderForArray()
are dense histograms. One last point before we move
on: since you (presumably) allocated the data
storage
area for the histogram bins yourself, there is no reason to call cvReleaseHist()
on your CvHistogram
structure. You will have to clean up the header structure (if you did not allocate it on the
stack) and, of course, clean up your data as well; but since these are "your" variables, you
are assumed to be taking care of this in your own way.
[90] This is also true of histograms representing information that falls naturally into discrete groups when the histogram uses fewer bins than the natural description would suggest or require. An example of this is representing 8-bit intensity values in a 10-bin histogram: each bin would then combine the points associated with approximately 25 different intensities, (erroneously) treating them all as equivalent.
[91] For you old timers, the value CV_HIST_TREE
is
still supported, but it is identical to CV_HIST_SPARSE
.
[92] These "pairs" are just C-arrays with only two entries.
[93] To clarify: in the case of a uniform histogram, if the lower and upper ranges are set to 0 and 10, respectively, and if there are two bins, then the bins will be assigned to the respective intervals [0, 5) and [5, 10]. In the case of a nonuniform histogram, if the size dimension i is 4 and if the corresponding ranges are set to (0, 2, 4, 9, 10), then the resulting bins will be assigned to the following (nonuniform) intervals: [0, 2), [2,4), [4, 9), and [9, 10].
[94] Have no fear that this argument is type int
,
because the only meaningful distinction is between zero and nonzero.