Understanding optical flow

Optical flow is the pattern of apparent motion between two consecutive frames of video. We will select feature points in the first frame and try to determine where those features have gone in the second frame. This search is subject to a few caveats:

OpenCV's calcOpticalFlowPyrLK function implements the Lucas-Kanade method of computing optical flow. Lucas-Kanade relies on a 3 x 3 neighborhood (that is, 9 pixels) around each feature. Taking each feature's neighborhood from the first frame, we will try to find the best matching neighborhood in the second frame, based on the least squared error. OpenCV's implementation of Lucas-Kanade uses an image pyramid, meaning it performs the search at various scales. Thus, it supports large or small motions alike. PyrLk in the function name stands for pyramidal Lucas-Kanade. The following figure is a visualization of an image pyramid—a progression from low-resolution (or low-magnification) images to high-resolution (or high-magnification) images:

Understanding optical flow

Note

For more details on optical flow and the Lucas-Kanade method, see the official OpenCV documentation at http://docs.opencv.org/trunk/doc/py_tutorials/py_video/py_lucas_kanade/py_lucas_kanade.html.

OpenCV offers implementations of other optical flow algorithms as well. For example, the calcOpticalFlowSF function implements the SimpleFlow algorithm, which makes optimizations for high-resolution video by assuming that pixels in a smooth (uniform) image region move in unison. The calcOpticalFlowFarneback function implements Gunnar Farneback's algorithm, which posits that a neighborhood remains identifiable, even during motion, by the coefficients of a polynomial relationship among its pixel values. Both of these algorithms are forms of dense optical flow, meaning that they analyze every pixel in the image instead of just the selected (sparse) features. All of OpenCV's optical flow functions are documented at http://docs.opencv.org/modules/video/doc/motion_analysis_and_object_tracking.html.

Of the several options, why choose calcOpticalFlowPyrLK? "You see, it is a pyramid," as Imhotep said to the Pharaoh Djoser, "and it has open spaces inside it." Hence, a pyramidal, sparse technique is a good way for us to cheaply and robustly track a few features in a face, which can change scale as it moves nearer or farther.

For our purposes, it is useful to select features inside a detected object, specifically a detected face. We will choose an inner portion of the face (to avoid background regions) and then use an OpenCV function called goodFeaturesToTrack, which selects features based on the algorithm described in the following paper:

Jianbo Shi and Carlo Tomasi, "Good Features to Track", Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp. 593-600, June 1994.

As the name suggests, the Good Features to Track algorithm (also known as the Shi-Tomasi algorithm) attempts to select features that work well with the algorithms of tracking and the use cases of tracking. As described in detail in the paper, Good Features to Track must have a stable appearance with respect to small changes in the camera's perspective. Examples of poor features to track are reflections (such as sunlight on a car's hood) and lines crossing at different depths (such as a tree's branches), since these features move quickly as the viewer or camera moves. The effects of a change in perspective can be simulated (albeit imperfectly) by warping a given image and moving its contents linearly. Based on such a simulation, the most stable features can be selected.

Note

OpenCV offers implementations of several more feature detection algorithms, besides Good Features to Track. Two of the other algorithms, called minimum eigenvalue corners and Harris corners, are precursors to Good Features to Track, which improves upon them. An official tutorial illustrates the relationship among them in a code sample at http://docs.opencv.org/doc/tutorials/features2d/trackingmotion/generic_corner_detector/generic_corner_detector.html.

Some of the more advanced feature detection algorithms in OpenCV are named FAST, ORB, SIFT, SURF, and FREAK. Compared to Good Features to Track, these more advanced alternatives evaluate a much larger set of potential features, at a much greater computational cost. They are overkill for a basic optical flow task such as ours. Once we have detected a face, we do not need many features in this region in order to distinguish between vertical motion (nodding) and horizontal motion (shaking). For our gesture recognition task, running at a fast frame rate is far more important than running with a large number of features. On the other hand, some computer vision tasks require a large number of features. Image recognition is a good example. If we put red lipstick on a poster of the Mona Lisa, the resulting image is not the Mona Lisa (or at least not Leonardo's version of her). An image's details can be considered fundamental to its identity. However, a change in lighting or perspective does not change an image's identity, so the feature detection and matching system still needs to be robust with respect to some changes.

For a project in image recognition and tracking, refer to Chapter 4, Recognizing and Tracking Images, and Chapter 5, Combining Image Tracking with 3D Rendering, of my book, Android Application Programming with OpenCV, by Packt Publishing.

For benchmarks of several feature detectors and matchers in OpenCV, refer to the series of articles on Ievgen Khvedchenia's blog, including http://computer-vision-talks.com/articles/2011-07-13-comparison-of-the-opencv-feature-detection-algorithms/.

For tutorials on several algorithms and their OpenCV implementations, refer to the Feature Detection and Description section of the OpenCV-Python Tutorials by Alexander Mordvintsev and Abid Rahman K at http://opencv-python-tutroals.readthedocs.org/en/latest/py_tutorials/py_feature2d/py_table_of_contents_feature2d/py_table_of_contents_feature2d.html.