Mean-Shift Segmentation

In Chapter 5 we introduced the function cvPyrSegmentation(). Pyramid segmentation uses a color merge (over a scale that depends on the similarity of the colors to one another) in order to segment images. This approach is based on minimizing the total energy in the image; here energy is defined by a link strength, which is further defined by color similarity. In this section we introduce cvPyrMeanShiftFiltering(), a similar algorithm that is based on mean-shift clustering over color [Comaniciu99]. We'll see the details of the mean-shift algorithm cvMeanShift() in Chapter 10, when we discuss tracking and motion. For now, what we need to know is that mean shift finds the peak of a color-spatial (or other feature) distribution over time. Here, mean-shift segmentation finds the peaks of color distributions over space. The common theme is that both the motion tracking and the color segmentation algorithms rely on the ability of mean shift to find the modes (peaks) of a distribution.

Given a set of multidimensional data points whose dimensions are (x, y, blue, green, red), mean shift can find the highest density "clumps" of data in this space by scanning a window over the space. Notice, however, that the spatial variables (x, y) can have very different ranges from the color magnitude ranges (blue, green, red). Therefore, mean shift needs to allow for different window radii in different dimensions. In this case we should have one radius for the spatial variables (spatialRadius) and one radius for the color magnitudes (colorRadius). As mean-shift windows move, all the points traversed by the windows that converge at a peak in the data become connected or "owned" by that peak. This ownership, radiating out from the densest peaks, forms the segmentation of the image. The segmentation is actually done over a scale pyramid (cvPyrUp(), cvPyrDown()), as described in Chapter 5, so that color clusters at a high level in the pyramid (shrunken image) have their boundaries refined at lower pyramid levels in the pyramid. The function call for cvPyrMeanShiftFiltering() looks like this:

void cvPyrMeanShiftFiltering(
  const CvArr*   src,
  CvArr*         dst,
  double         spatialRadius,
  double         colorRadius,
  int            max_level    = 1,
  CvTermCriteria termcrit     = cvTermCriteria(
      CV_TERMCRIT_ITER | CV_TERMCRIT_EPS,
      5,
      1
   )
  );

In cvPyrMeanShiftFiltering() we have an input image src and an output image dst. Both must be 8-bit, three-channel color images of the same width and height. The spatialRadius and colorRadius define how the mean-shift algorithm averages color and space together to form a segmentation. For a 640-by-480 color image, it works well to set spatialRadius equal to 2 and colorRadius equal to 40. The next parameter of this algorithm is max_level, which describes how many levels of scale pyramid you want used for segmentation. A max_level of 2 or 3 works well for a 640-by-480 color image.

The final parameter is CvTermCriteria, which we saw in Chapter 8. CvTermCriteria is used for all iterative algorithms in OpenCV. The mean-shift segmentation function comes with good defaults if you just want to leave this parameter blank. Otherwise, cvTermCriteria has the following constructor:

cvTermCriteria(
    int    type; // CV_TERMCRIT_ITER, CV_TERMCRIT_EPS,
    int    max_iter,
    double epsilon
);

Typically we use the cvTermCriteria() function to generate the CvTermCriteria structure that we need. The first argument is either CV_TERMCRIT_ITER or CV_TERMCRIT_EPS, which tells the algorithm that we want to terminate either after some fixed number of iterations or when the convergence metric reaches some small value (respectively). The next two arguments set the values at which one, the other, or both of these criteria should terminate the algorithm. The reason we have both options is because we can set the type to CV_TERMCRIT_ITER | CV_TERMCRIT_EPS to stop when either limit is reached. The parameter max_iter limits the number of iterations if CV_TERMCRIT_ITER is set, whereas epsilon sets the error limit if CV_TERMCRIT_EPS is set. Of course the exact meaning of epsilon depends on the algorithm.

Figure 9-11 shows an example of mean-shift segmentation using the following values:

cvPyrMeanShiftFiltering( src, dst, 20, 40, 2);
Mean-shift segmentation over scale using cvPyrMeanShiftFiltering() with parameters max_level=2, spatialRadius=20, and colorRadius=40; similar areas now have similar values and so can be treated as super pixels, which can speed up subsequent processing significantly

Figure 9-11. Mean-shift segmentation over scale using cvPyrMeanShiftFiltering() with parameters max_level=2, spatialRadius=20, and colorRadius=40; similar areas now have similar values and so can be treated as super pixels, which can speed up subsequent processing significantly