The Lucas-Kanade method is used for sparse optical flow tracking. By sparse, we mean that the number of feature points is relatively low. You can refer to their original paper here: http://cseweb.ucsd.edu/classes/sp02/cse252/lucaskanade81.pdf. We start the process by extracting the feature points. For each feature point, we create 3 x 3 patches with the feature point at the center. The assumption here is that all the points within each patch will have a similar motion. We can adjust the size of this window depending on the problem at hand.
For each feature point in the current frame, we take the surrounding 3 x 3 patch as our reference point. For this patch, we look in its neighborhood in the previous frame to get the best match. This neighborhood is usually bigger than 3 x 3 because we want to get the patch that's closest to the patch under consideration. Now, the path from the center pixel of the matched patch in the previous frame to the center pixel of the patch under consideration in the current frame will become the motion vector. We do that for all the feature points and extract all the motion vectors.
Let's consider the following frame:
We need to add some points that we want to track. Just go ahead and click on a bunch of points on this window with your mouse:
If I move into a different position, you will see that the points are still being tracked correctly within a small margin of error:
Let's add a lot of points and see what happens:
As we can see, it will keep tracking those points. But, you will notice that some of the points will be dropped because of factors such as prominence or speed of movement. If you want to play around with it, you can just keep adding more points to it. You can also let the user select a region of interest in the input video. You can then extract feature points from this region of interest and track the object by drawing a bounding box. It will be a fun exercise!
Here is the code to do Lucas-Kanade-based tracking:
int main(int argc, char* argv[]) { // Variable declaration and initialization // Iterate until the user hits the Esc key while(true) { // Capture the current frame cap >> frame; // Check if the frame is empty if(frame.empty()) break; // Resize the frame resize(frame, frame, Size(), scalingFactor, scalingFactor, INTER_AREA); // Copy the input frame frame.copyTo(image); // Convert the image to grayscale cvtColor(image, curGrayImage, COLOR_BGR2GRAY); // Check if there are points to track if(!trackingPoints[0].empty()) { // Status vector to indicate whether the flow for the corresponding features has been found vector<uchar> statusVector; // Error vector to indicate the error for the corresponding feature vector<float> errorVector; // Check if previous image is empty if(prevGrayImage.empty()) { curGrayImage.copyTo(prevGrayImage); } // Calculate the optical flow using Lucas-Kanade algorithm calcOpticalFlowPyrLK(prevGrayImage, curGrayImage, trackingPoints[0], trackingPoints[1], statusVector, errorVector, windowSize, 3, terminationCriteria, 0, 0.001);
We use the current image and the previous image to compute the optical flow information. Needless to say, the quality of the output will depend on the parameters chosen. You can find more details about the parameters at http://docs.opencv.org/2.4/modules/video/doc/motion_analysis_and_object_tracking.html#calcopticalflowpyrlk. To increase quality and robustness, we need to filter out the points that are very close to each other because they're not adding new information. Let's go ahead and do that:
int count = 0; // Minimum distance between any two tracking points int minDist = 7; for(int i=0; i < trackingPoints[1].size(); i++) { if(pointTrackingFlag) { // If the new point is within 'minDist' distance from an existing point, it will not be tracked if(norm(currentPoint - trackingPoints[1][i]) <= minDist) { pointTrackingFlag = false; continue; } } // Check if the status vector is good if(!statusVector[i]) continue; trackingPoints[1][count++] = trackingPoints[1][i]; // Draw a filled circle for each of the tracking points int radius = 8; int thickness = 2; int lineType = 8; circle(image, trackingPoints[1][i], radius, Scalar(0,255,0), thickness, lineType); } trackingPoints[1].resize(count); }
We now have the tracking points. The next step is to refine the location of those points. What exactly does refine mean in this context? To increase the speed of computation, there is some level of quantization involved. In layman's terms, you can think of it as rounding off. Now that we have the approximate region, we can refine the location of the point within that region to get a more accurate outcome. Let's go ahead and do that:
// Refining the location of the feature points if(pointTrackingFlag && trackingPoints[1].size() < maxNumPoints) { vector<Point2f> tempPoints; tempPoints.push_back(currentPoint); // Function to refine the location of the corners to subpixel accuracy. // Here, 'pixel' refers to the image patch of size 'windowSize' and not the actual image pixel cornerSubPix(curGrayImage, tempPoints, windowSize, Size(-1,-1), terminationCriteria); trackingPoints[1].push_back(tempPoints[0]); pointTrackingFlag = false; } // Display the image with the tracking points imshow(windowName, image); // Check if the user pressed the Esc key char ch = waitKey(10); if(ch == 27) break; // Swap the 'points' vectors to update 'previous' to 'current' std::swap(trackingPoints[1], trackingPoints[0]); // Swap the images to update previous image to current image cv::swap(prevGrayImage, curGrayImage); } return 1; }