Convolution

Convolution is the basis of many of the transformations that we discuss in this chapter. In the abstract, this term means something we do to every part of an image. In this sense, many of the operations we looked at in Chapter 5 can also be understood as special cases of the more general process of convolution. What a particular convolution "does" is determined by the form of the Convolution kernel being used. This kernel is essentially just a fixed size array of numerical coefficients along with an anchor point in that array, which is typically located at the center. The size of the array ^[62] is called the support of the kernel.

Figure 6-1 depicts a 3-by-3 convolution kernel with the anchor located at the center of the array. The value of the convolution at a particular point is computed by first placing the kernel anchor on top of a pixel on the image with the rest of the kernel overlaying the corresponding local pixels in the image. For each kernel point, we now have a value for the kernel at that point and a value for the image at the corresponding image point. We multiply these together and sum the result; this result is then placed in the resulting image at the location corresponding to the location of the anchor in the input image. This process is repeated for every point in the image by scanning the kernel over the entire image.

Figure 6-1. A 3-by-3 kernel for a Sobel derivative; note that the anchor point is in the center of the kernel

We can, of course, express this procedure in the form of an equation. If we define the image to be I(x, y), the kernel to be G(i, j) (where 0 ≤ i < M_i –1 and 0 ≤ j ≤ M_j –1), with M_i, M_j being the extent of the kernel in the x- and y-dimensions respectively), and the anchor point to be located at (a_i, a_j) in the coordinates of the kernel, then the convolution H(x, y) is defined by the following expression:

Observe that the number of operations, at least at first glance, seems to be the number of pixels in the image multiplied by the number of pixels in the kernel. ^[63] This can be a lot of computation and so is not something you want to do with some "for" loop and a lot of pointer de-referencing. In situations like this, it is better to let OpenCV do the work for you and take advantage of the optimizations already programmed into OpenCV. The OpenCV way to do this is with cvFilter2D():

void cvFilter2D(
   const CvArr*    src,
   CvArr*          dst,
   const CvMat*    kernel,
   CvPoint         anchor = cvPoint(-1,-1)
);

Here we create a matrix of the appropriate size, fill it with the coefficients, and then pass it together with the source and destination images into cvFilter2D(). We can also optionally pass in a CvPoint to indicate the location of the center of the kernel, but the default value (equal to cvPoint(-1,-1)) is interpreted as indicating the center of the kernel. The kernel can be of even size if its anchor point is defined; otherwise, it should be of odd size.

The src and dst images should be the same size. One might think that the src image should be larger than the dst image in order to allow for the extra width and length of the convolution kernel. But the sizes of the src and dst can be the same in OpenCV because, by default, prior to convolution OpenCV creates virtual pixels via replication past the border of the src image so that the border pixels in dst can be filled in. The replication is done as input(–dx, y) = input(0, y), input(w + dx, y) = input(w – 1, y), and so forth. There are some alternatives to this default behavior; we will discuss them in the next section.

We remark that the coefficients of the convolution kernel should always be floating-point numbers. This means that you should use CV_32FC1 when allocating that matrix.

Convolution Boundaries

One problem that naturally arises with convolutions is how to handle the boundaries. For example, when using the convolution kernel just described, what happens when the point being convolved is at the edge of the image? Most of OpenCV's built-in functions that make use of cvFilter2D() must handle this in one way or another. Similarly, when doing your own convolutions, you will need to know how to deal with this efficiently.

The solution comes in the form of the cvCopyMakeBorder() function, which copies a given image onto another slightly larger image and then automatically pads the boundary in one way or another:

void cvCopyMakeBorder(
   const CvArr*   src,
   CvArr*         dst,
   CvPoint        offset,
   int            bordertype,
   CvScalar       value     = cvScalarAll(0)
);

The offset argument tells cvCopyMakeBorder() where to place the copy of the original image within the destination image. Typically, if the kernel is N-by-N (for odd N) then you will want a boundary that is (N – 1)/2 wide on all sides or, equivalently, an image that is N – 1 wider and taller than the original. In this case you would set the offset to cvPoint((N-1)/2,(N-1)/2) so that the boundary would be even on all sides. ^[64]

The bordertype can be either IPL_BORDER_CONSTANT or IPL_BORDER_REPLICATE (see Figure 6-2). In the first case, the value argument will be interpreted as the value to which all pixels in the boundary should be set. In the second case, the row or column at the very edge of the original is replicated out to the edge of the larger image. Note that the border of the test pattern image is somewhat subtle (examine the upper left image in Figure 6-2); in the test pattern image, there's a one-pixel-wide dark border except where the circle patterns come near the border where it turns white. There are two other border types defined, IPL_BORDER_REFLECT and IPL_BORDER_WRAP, which are not implemented at this time in OpenCV but may be supported in the future.

Expanding the image border. The left column shows IPL_BORDER_CONSTANT where a zero value is used to fill out the borders. The right column shows IPL_BORDER_REPLICATE where the border pixels are replicated in the horizontal and vertical directions

Figure 6-2. Expanding the image border. The left column shows IPL_BORDER_CONSTANT where a zero value is used to fill out the borders. The right column shows IPL_BORDER_REPLICATE where the border pixels are replicated in the horizontal and vertical directions

We mentioned previously that, when you make calls to OpenCV library functions that employ convolution, those library functions call cvCopyMakeBorder() to get their work done. In most cases the border type called is IPL_BORDER_REPLICATE, but sometimes you will not want it to be done that way. This is another occasion where you might want to use cvCopyMakeBorder(). You can create a slightly larger image with the border you want, call whatever routine on that image, and then clip back out the part you were originally interested in. This way, OpenCV's automatic bordering will not affect the pixels you care about.

^[62] For technical purists, the support of the kernel actually consists of only the nonzero portion of the kernel array.

^[63] We say "at first glance" because it is also possible to perform convolutions in the frequency domain. In this case, for an N-by-N image and an M-by-M kernel with N > M, the computational time will be proportional to N² log(N) and not to the N²M² that is expected for computations in the spatial domain. Because the frequency domain computation is independent of the size of the kernel, it is more efficient for large kernels. OpenCV automatically decides whether to do the convolution in the frequency domain based on the size of the kernel.

^[64] Of course, the case of N-by-N with N odd and the anchor located at the center is the simplest case. In general, if the kernel is N-by-M and the anchor is located at (a_x, a_y), then the destination image will have to be N – 1 pixels wider and M – 1 pixels taller than the source image. The offset will simply be (a_x, a_y).