Imagine the following conversation:
Person A: I can't find my print of The Starry Night. Do you know where it is?
Person B: What does it look like?
For a computer, or for someone who is naive about Western art, Person B's question is quite reasonable. Before we can use our sense of sight (or other senses) to track something, we need to have sensed that thing before. (Failing that, we at least need a good description of what we will sense.) For computer vision, we must provide a reference image that will be compared with the live camera image or scene. If the target has complex geometry or moving parts, we might need to provide many reference images to account for different perspectives and poses. However, for our examples using famous paintings, we will assume that the target is rectangular and rigid.
For this chapter's purposes, let's say that the goal of tracking is to determine how our rectangular target is posed in 3D. With this information, we can draw an outline around our target. In the final 2D image, the outline will be a quadrilateral (not necessarily a rectangle), since the target could be skewed away from the camera.
There are four major steps in this type of tracking:
There are many different techniques for performing each of the first three steps. OpenCV provides relevant classes called FeatureDetector
, DescriptorExtractor
, and DescriptorMatcher
, each supporting several techniques. We will use a combination of techniques that OpenCV calls FeatureDetector.STAR
, DescriptorExtractor.FREAK
, and DescriptorMatcher.BRUTEFORCE_HAMMING
. This combination is relatively fast and robust. Unlike some alternatives, it is
scale-invariant and
rotation-invariant, meaning that the target can be tracked from various distances and perspectives. Also, unlike some other alternatives, it is not patented so it is free for use even in commercial applications.
For a mathematical description of FREAK and its merits relative to other descriptor extractors, see the paper FREAK: Fast Retina Keypoint by Alahi, Ortiz, and Vandergheynst. An electronic version of the paper is available at http://infoscience.epfl.ch/record/175537/files/2069.pdf.