Convolution is the basis of many of the transformations that we discuss in this chapter. In the abstract, this term means something we do to every part of an image. In this sense, many of the operations we looked at in Chapter 5 can also be understood as special cases of the more general process of convolution. What a particular convolution "does" is determined by the form of the Convolution kernel being used. This kernel is essentially just a fixed size array of numerical coefficients along with an anchor point in that array, which is typically located at the center. The size of the array [62] is called the support of the kernel.
Figure 6-1 depicts a 3-by-3 convolution kernel with the anchor located at the center of the array. The value of the convolution at a particular point is computed by first placing the kernel anchor on top of a pixel on the image with the rest of the kernel overlaying the corresponding local pixels in the image. For each kernel point, we now have a value for the kernel at that point and a value for the image at the corresponding image point. We multiply these together and sum the result; this result is then placed in the resulting image at the location corresponding to the location of the anchor in the input image. This process is repeated for every point in the image by scanning the kernel over the entire image.
Figure 6-1. A 3-by-3 kernel for a Sobel derivative; note that the anchor point is in the center of the kernel
We can, of course, express this procedure in the form of an equation. If we define the image to be I(x, y), the kernel to be G(i, j) (where 0 ≤ i < Mi –1 and 0 ≤ j ≤ Mj –1), with Mi, Mj being the extent of the kernel in the x- and y-dimensions respectively), and the anchor point to be located at (ai, aj) in the coordinates of the kernel, then the convolution H(x, y) is defined by the following expression:
Observe that the number of operations, at least at first glance, seems to be the number
of pixels in the image multiplied by the number of pixels in the kernel. [63] This can be a lot of computation and so is not something you want to do with
some "for" loop and a lot of pointer de-referencing. In situations like this, it is better
to let OpenCV do the work for you and take advantage of the optimizations already programmed
into OpenCV. The OpenCV way to do this is with cvFilter2D()
:
void cvFilter2D( const CvArr* src, CvArr* dst, const CvMat* kernel, CvPoint anchor = cvPoint(-1,-1) );
Here we create a matrix of the appropriate size, fill it with the coefficients, and then
pass it together with the source and destination images into cvFilter2D()
. We can also optionally pass in a CvPoint
to indicate the location of the center of the kernel, but the default
value (equal to cvPoint(-1,-1)
) is interpreted as
indicating the center of the kernel. The kernel can be of even size if its anchor point is
defined; otherwise, it should be of odd size.
The src
and dst
images should be the same size. One might think that the src
image should be larger than the dst
image in order to allow for the extra width and length of the convolution kernel. But the sizes of the src
and dst
can be the same in OpenCV because, by default,
prior to convolution OpenCV creates virtual pixels via replication past the border of the src
image so that the border pixels in dst
can be filled in. The replication is done as input(–dx, y) = input(0,
y), input(w + dx, y) =
input(w – 1, y), and so forth. There are some
alternatives to this default behavior; we will discuss them in the next section.
We remark that the coefficients of the convolution kernel should always be
floating-point numbers. This means that you should use CV_32FC1
when allocating that matrix.
One problem that naturally arises with convolutions is how to handle the boundaries. For example, when using the
convolution kernel just described, what happens when the point being convolved is at the
edge of the image? Most of OpenCV's built-in functions that make use of cvFilter2D()
must handle this in one way or another.
Similarly, when doing your own convolutions, you will need to know how to deal with this
efficiently.
The solution comes in the form of the cvCopyMakeBorder()
function, which copies a given image onto another slightly
larger image and then automatically pads the boundary in one way or another:
void cvCopyMakeBorder( const CvArr* src, CvArr* dst, CvPoint offset, int bordertype, CvScalar value = cvScalarAll(0) );
The offset
argument tells cvCopyMakeBorder()
where to place the copy of the original image within the
destination image. Typically, if the kernel is
N-by-N (for odd N) then
you will want a boundary that is (N – 1)/2 wide on all sides or,
equivalently, an image that is N – 1 wider and taller than the
original. In this case you would set the offset to cvPoint((N-1)/2,(N-1)/2)
so that the boundary would be even on all sides.
[64]
The bordertype
can be either IPL_BORDER_CONSTANT
or IPL_BORDER_REPLICATE
(see Figure 6-2). In the first case, the value
argument will be interpreted as the value to which all
pixels in the boundary should be set. In the second case, the row or column at the very
edge of the original is replicated out to the edge of the larger image. Note that the
border of the test pattern image is somewhat subtle (examine the upper left image in Figure 6-2); in the test pattern image,
there's a one-pixel-wide dark border except where the circle patterns come near the border
where it turns white. There are two other border types defined, IPL_BORDER_REFLECT
and IPL_BORDER_WRAP
,
which are not implemented at this time in OpenCV but may be supported in the
future.
Figure 6-2. Expanding the image border. The left column shows IPL_BORDER_CONSTANT where a zero value is used to fill out the borders. The right column shows IPL_BORDER_REPLICATE where the border pixels are replicated in the horizontal and vertical directions
We mentioned previously that, when you make calls to OpenCV library functions that
employ convolution, those library functions call cvCopyMakeBorder()
to get their work done. In most cases the border type
called is IPL_BORDER_REPLICATE
, but sometimes you will
not want it to be done that way. This is another occasion where you might want to use
cvCopyMakeBorder()
. You can create a slightly larger
image with the border you want, call whatever routine on that image, and then clip back
out the part you were originally interested in. This way, OpenCV's automatic bordering
will not affect the pixels you care about.
[62] For technical purists, the support of the kernel actually consists of only the nonzero portion of the kernel array.
[63] We say "at first glance" because it is also possible to perform convolutions in the frequency domain. In this case, for an N-by-N image and an M-by-M kernel with N > M, the computational time will be proportional to N2 log(N) and not to the N2M2 that is expected for computations in the spatial domain. Because the frequency domain computation is independent of the size of the kernel, it is more efficient for large kernels. OpenCV automatically decides whether to do the convolution in the frequency domain based on the size of the kernel.
[64] Of course, the case of N-by-N with N odd and the anchor located at the center is the simplest case. In general, if the kernel is N-by-M and the anchor is located at (ax, ay), then the destination image will have to be N – 1 pixels wider and M – 1 pixels taller than the source image. The offset will simply be (ax, ay).