In many practical contexts, we would like to segment an image but do not have the benefit of a separate background image. One technique that is often effective in this context is the watershed algorithm [Meyer92]. This algorithm converts lines in an image into "mountains" and uniform regions into "valleys" that can be used to help segment objects. The watershed algorithm first takes the gradient of the intensity image; this has the effect of forming valleys or basins (the low points) where there is no texture and of forming mountains or ranges (high ridges corresponding to edges) where there are dominant lines in the image. It then successively floods basins starting from user-specified (or algorithm-specified) points until these regions meet. Regions that merge across the marks so generated are segmented as belonging together as the image "fills up". In this way, the basins connected to the marker point become "owned" by that marker. We then segment the image into the corresponding marked regions.
More specifically, the watershed algorithm allows a user (or another algorithm!) to mark parts of an object or background that are known to be part of the object or background. The user or algorithm can draw a simple line that effectively tells the watershed algorithm to "group points like these together". The watershed algorithm then segments the image by allowing marked regions to "own" the edge-defined valleys in the gradient image that are connected with the segments. Figure 9-8 clarifies this process.
The function specification of the watershed segmentation algorithm is:
void cvWatershed( const CvArr* image, CvArr* markers );
Figure 9-7. With the averaging method (top row), the connected-components cleanup knocks out the fingers (upper right); the codebook method (bottom row) does much better at segmentation and creates a clean connected-component mask (lower right)
Figure 9-8. Watershed algorithm: after a user has marked objects that belong together (left panel), the algorithm then merges the marked area into segments (right panel)
Here, image
is an 8-bit color (three-channel) image
and markers
is a single-channel integer (IPL_DEPTH_32S
) image of the same (x, y)
dimensions; the value of markers
is 0
except where the user (or an algorithm) has indicated by using positive numbers that some regions belong
together. For example, in the left panel of Figure 9-8, the orange might have been marked
with a "1", the lemon with a "2", the lime with "3", the upper background with "4" and so
on. This produces the segmentation you see in the same figure on the right.