Running an FFT on a full-resolution video feed would be slow. Also, the resulting frequencies would reflect localized phenomena at each captured pixel, such that the motion map (the result of filtering the frequencies and then applying the IFFT) might appear noisy and overly sharpened. To address these problems, we want a cheap, blurry downsampling technique. However, we also want the option to enhance edges, which are important to our perception of motion.
Our need for a blurry downsampling technique is fulfilled by a Gaussian image pyramid. A Gaussian filter blurs an image by making each output pixel a weighted average of many input pixels in the neighborhood. An image pyramid is a series in which each image is half the width and height of the previous image. The halving of image dimensions is achieved by decimation, meaning that every other pixel is simply omitted. A Gaussian image pyramid is constructed by applying a Gaussian filter before each decimation operation.
Our need to enhance edges in downsampled images is fulfilled by a Laplacian image pyramid, which is constructed in the following manner. Suppose we have already constructed a Gaussian image pyramid. We take the image at level i+1
in the Gaussian pyramid, upsample it by duplicating the pixels, and apply a Gaussian filter to it again. We then subtract the result from the image at level i
in the Gaussian pyramid to produce the corresponding image at level i
of the Laplacian pyramid. Thus, the Laplacian image is the difference between a blurry, downsampled image and an even blurrier image that was downsampled, downsampled again, and upsampled.
You might wonder how such an algorithm is a form of edge finding. Consider that edges are areas of local contrast, while non-edges are areas of local uniformity. If we blur a uniform area, it is still uniform—zero difference. If we blur a contrasting area, it becomes more uniform—nonzero difference. Thus, the difference can be used to find edges.
The Gaussian and Laplacian image pyramids are described in detail in the journal article downloadable from http://web.mit.edu/persci/people/adelson/pub_pdfs/RCA84.pdf.
This article is written by E. H. Adelson, C. H. Anderson, J. R. Bergen, P. J. Burt, and J. M. Ogden on "Pyramid methods in image processing, RCA Engineer, vol. 29, no. 6, November/December 1984.
Besides using image pyramids to downsample the FFT's input, we also use them to upsample the most recent frame of the IFFT's output. This upsampling step is necessary to create an overlay that matches the size of the original camera image so that we can composite the two. Like in the construction of the Laplacian pyramid, upsampling consists of duplicating pixels and applying a Gaussian filter.
OpenCV implements the relevant downsizing and upsizing functions as cv2.pyrDown
and cv2.pyrUp
. These functions are useful in compositing two images in general (whether or not signal processing is involved) because they enable us to soften differences while preserving edges. The OpenCV documentation includes a good tutorial on this topic at http://docs.opencv.org/trunk/doc/py_tutorials/py_imgproc/py_pyramids/py_pyramids.html.
Now, we are armed with the knowledge to implement Lazy Eyes!