Hough Transforms

The Hough transform[68] is a method for finding lines, circles, or other simple forms in an image. The original Hough transform was a line transform, which is a relatively fast way of searching a binary image for straight lines. The transform can be further generalized to cases other than just simple lines.

The basic theory of the Hough line transform is that any point in a binary image could be part of some set of possible lines. If we parameterize each line by, for example, a slope a and an intercept b, then a point in the original image is transformed to a locus of points in the (a, b) plane corresponding to all of the lines passing through that point (see Figure 6-9). If we convert every nonzero pixel in the input image into such a set of points in the output image and sum over all such contributions, then lines that appear in the input (i.e., (x, y) plane) image will appear as local maxima in the output (i.e., (a, b) plane) image. Because we are summing the contributions from each point, the (a, b) plane is commonly called the accumulator plane.

It might occur to you that the slope-intercept form is not really the best way to represent all of the lines passing through a point (because of the considerably different density of lines as a function of the slope, and the related fact that the interval of possible slopes goes from –∞ to +∞). It is for this reason that the actual parameterization of the transform image used in numerical computation is somewhat different. The preferred parameterization represents each line as a point in polar coordinates (ρ, θ), with the implied line being the line passing through the indicated point but perpendicular to the radial from the origin to that point (see Figure 6-10). The equation for such a line is:

image with no caption

The OpenCV Hough transform algorithm does not make this computation explicit to the user. Instead, it simply returns the local maxima in the (ρ, θ) plane. However, you will need to understand this process in order to understand the arguments to the OpenCV Hough line transform function.

OpenCV supports two different kinds of Hough line transform: the standard Hough transform (SHT) [Duda72] and the progressive probabilistic Hough transform (PPHT). [69] The SHT is the algorithm we just looked at. The PPHT is a variation of this algorithm that, among other things, computes an extent for individual lines in addition to the orientation (as shown in Figure 6-11). It is "probabilistic" because, rather than accumulating every possible point in the accumulator plane, it accumulates only a fraction of them. The idea is that if the peak is going to be high enough anyhow, then hitting it only a fraction of the time will be enough to find it; the result of this conjecture can be a substantial reduction in computation time. Both of these algorithms are accessed with the same OpenCV function, though the meanings of some of the arguments depend on which method is being used.

CvSeq* cvHoughLines2(
  CvArr* image,
  void*  line_storage,
  int    method,
  double rho,
  double theta,
  int    threshold,
  double param1      = 0,
  double param2      = 0
);

The first argument is the input image. It must be an 8-bit image, but the input is treated as binary information (i.e., all nonzero pixels are considered to be equivalent). The second argument is a pointer to a place where the results can be stored, which can be either a memory storage (see CvMemoryStorage in Chapter 8) or a plain N-by-1 matrix array (the number of rows, N, will serve to limit the maximum number of lines returned). The next argument, method, can be CV_HOUGH_STANDARD, CV_HOUGH_PROBABILISTIC, or CV_HOUGH_MULTI_SCALE for (respectively) SHT, PPHT, or a multiscale variant of SHT.

The next two arguments, rho and theta, set the resolution desired for the lines (i.e., the resolution of the accumulator plane). The units of rho are pixels and the units of theta are radians; thus, the accumulator plane can be thought of as a two-dimensional histogram with cells of dimension rho pixels by theta radians. The threshold value is the value in the accumulator plane that must be reached for the routine to report a line. This last argument is a bit tricky in practice; it is not normalized, so you should expect to scale it up with the image size for SHT. Remember that this argument is, in effect, indicating the number of points (in the edge image) that must support the line for the line to be returned.

The param1 and param2 arguments are not used by the SHT. For the PPHT, param1 sets the minimum length of a line segment that will be returned, and param2 sets the separation between collinear segments required for the algorithm not to join them into a single longer segment. For the multiscale HT, the two parameters are used to indicate higher resolutions to which the parameters for the lines should be computed. The multiscale HT first computes the locations of the lines to the accuracy given by the rho and theta parameters and then goes on to refine those results by a factor of param1 and param2, respectively (i.e., the final resolution in rho is rho divided by param1 and the final resolution in theta is theta divided by param2).

What the function returns depends on how it was called. If the line_storage value was a matrix array, then the actual return value will be NULL. In this case, the matrix should be of type CV_32FC2 if the SHT or multi-scale HT is being used and should be CV_32SC4 if the PPHT is being used. In the first two cases, the ρ- and θ-values for each line will be placed in the two channels of the array. In the case of the PPHT, the four channels will hold the x- and y-values of the start and endpoints of the returned segments. In all of these cases, the number of rows in the array will be updated by cvHoughLines2() to correctly reflect the number of lines returned.

If the line_storage value was a pointer to a memory store, [70] then the return value will be a pointer to a CvSeq sequence structure. In that case, you can get each line or line segment from the sequence with a command like

float* line = (float*) cvGetSeqElem( lines , i );

where lines is the return value from cvHoughLines2() and i is index of the line of interest. In this case, line will be a pointer to the data for that line, with line[0] and line[1] being the floating-point values ρ and θ (for SHT and MSHT) or CvPoint structures for the endpoints of the segments (for PPHT).

The Hough circle transform [Kimme75] (see Figure 6-12) works in a manner roughly analogous to the Hough line transforms just described. The reason it is only "roughly" is that—if one were to try doing the exactly analogous thing—the accumulator plane would have to be replaced with an accumulator volume with three dimensions: one for x, one for y, and another for the circle radius r. This would mean far greater memory requirements and much slower speed. The implementation of the circle transform in OpenCV avoids this problem by using a somewhat more tricky method called the Hough gradient method.

The Hough gradient method works as follows. First the image is passed through an edge detection phase (in this case, cvCanny()). Next, for every nonzero point in the edge image, the local gradient is considered (the gradient is computed by first computing the first-order Sobel x- and y-derivatives via cvSobel()). Using this gradient, every point along the line indicated by this slope—from a specified minimum to a specified maximum distance—is incremented in the accumulator. At the same time, the location of every one of these nonzero pixels in the edge image is noted. The candidate centers are then selected from those points in this (two-dimensional) accumulator that are both above some given threshold and larger than all of their immediate neighbors. These candidate centers are sorted in descending order of their accumulator values, so that the centers with the most supporting pixels appear first. Next, for each center, all of the nonzero pixels (recall that this list was built earlier) are considered. These pixels are sorted according to their distance from the center. Working out from the smallest distances to the maximum radius, a single radius is selected that is best supported by the nonzero pixels. A center is kept if it has sufficient support from the nonzero pixels in the edge image and if it is a sufficient distance from any previously selected center.

This implementation enables the algorithm to run much faster and, perhaps more importantly, helps overcome the problem of the otherwise sparse population of a three-dimensional accumulator, which would lead to a lot of noise and render the results unstable. On the other hand, this algorithm has several shortcomings that you should be aware of.

First, the use of the Sobel derivatives to compute the local gradient—and the attendant assumption that this can be considered equivalent to a local tangent—is not a numerically stable proposition. It might be true "most of the time," but you should expect this to generate some noise in the output.

Second, the entire set of nonzero pixels in the edge image is considered for every candidate center; hence, if you make the accumulator threshold too low, the algorithm will take a long time to run. Third, because only one circle is selected for every center, if there are concentric circles then you will get only one of them.

Finally, because centers are considered in ascending order of their associated accumulator value and because new centers are not kept if they are too close to previously accepted centers, there is a bias toward keeping the larger circles when multiple circles are concentric or approximately concentric. (It is only a "bias" because of the noise arising from the Sobel derivatives; in a smooth image at infinite resolution, it would be a certainty.)

With all of that in mind, let's move on to the OpenCV routine that does all this for us:

CvSeq* cv HoughCircles(
  CvArr* image,
  void*  circle_storage,
  int    method,
  double dp,
  double min_dist,
  double param1     = 100,
  double param2     = 300,
  int    min_radius = 0,
  int    max_radius = 0
);

The Hough circle transform function cvHoughCircles() has similar arguments to the line transform. The input image is again an 8-bit image. One significant difference between cvHoughCircles() and cvHoughLines2() is that the latter requires a binary image. The cvHoughCircles() function will internally (automatically) call cvSobel() [71] for you, so you can provide a more general grayscale image.

The circle_storage can be either an array or memory storage, depending on how you would like the results returned. If an array is used, it should be a single column of type CV_32FC3; the three channels will be used to encode the location of the circle and its radius. If memory storage is used, then the circles will be made into an OpenCV sequence and a pointer to that sequence will be returned by cvHoughCircles(). (Given an array pointer value for circle_storage, the return value of cvHoughCircles() is NULL.) The method argument must always be set to CV_HOUGH_GRADIENT.

The parameter dp is the resolution of the accumulator image used. This parameter allows us to create an accumulator of a lower resolution than the input image. (It makes sense to do this because there is no reason to expect the circles that exist in the image to fall naturally into the same number of bins as the width or height of the image itself.) If dp is set to 1 then the resolutions will be the same; if set to a larger number (e.g., 2), then the accumulator resolution will be smaller by that factor (in this case, half). The value of dp cannot be less than 1.

The parameter min_dist is the minimum distance that must exist between two circles in order for the algorithm to consider them distinct circles.

For the (currently required) case of the method being set to CV_HOUGH_GRADIENT, the next two arguments, param1 and param2, are the edge (Canny) threshold and the accumulator threshold, respectively. You may recall that the Canny edge detector actually takes two different thresholds itself. When cvCanny() is called internally, the first (higher) threshold is set to the value of param1 passed into cvHoughCircles(), and the second (lower) threshold is set to exactly half that value. The parameter param2 is the one used to threshold the accumulator and is exactly analogous to the threshold argument of cvHoughLines().

The final two parameters are the minimum and maximum radius of circles that can be found. This means that these are the radii of circles for which the accumulator has a representation. Example 6-1 shows an example program using cvHoughCircles().

It is worth reflecting momentarily on the fact that, no matter what tricks we employ, there is no getting around the requirement that circles be described by three degrees of freedom (x, y, and r), in contrast to only two degrees of freedom (ρ and θ) for lines. The result will invariably be that any circle-finding algorithm requires more memory and computation time than the line-finding algorithms we looked at previously. With this in mind, it's a good idea to bound the radius parameter as tightly as circumstances allow in order to keep these costs under control. [72] The Hough transform was extended to arbitrary shapes by Ballard in 1981 [Ballard81] basically by considering objects as collections of gradient edges.



[68] Hough developed the transform for use in physics experiments [Hough59]; its use in vision was introduced by Duda and Hart [Duda72].

[69] The "probablistic Hough transform" (PHT) was introduced by Kiryati, Eldar, and Bruckshtein in 1991 [Kiryati91]; the PPHT was introduced by Matas, Galambosy, and Kittler in 1999 [Matas00].

[70] We have not yet introduced the concept of a memory store or a sequence, but Chapter 8 is devoted to this topic.

[71] The function cvSobel(), not cvCanny(), is called internally. The reason is that cvHoughCircles() needs to estimate the orientation of a gradient at each pixel, and this is difficult to do with binary edge map.

[72] Although cvHoughCircles() catches centers of the circles quite well, it sometimes fails to find the correct radius. Therefore, in an application where only a center must be found (or where some different technique can be used to find the actual radius), the radius returned by cvHoughCircles() can be ignored.