Building an interactive object tracker

A colorspace-based tracker gives us the freedom to track a colored object, but we are also constrained to a predefined color. What if we just want to pick an object at random? How do we build an object tracker that can learn the characteristics of the selected object and just track it automatically? This is where the continuously-adaptive meanshift (CAMShift) algorithm comes into picture. It's basically an improved version of the meanshift algorithm.

The concept of meanshift is actually nice and simple. Let's say we select a region of interest and we want our object tracker to track that object. In this region, we select a bunch of points based on the color histogram and we compute the centroid of spatial points. If the centroid lies at the center of this region, we know that the object hasn't moved. But if the centroid is not at the center of this region, then we know that the object is moving in some direction. The movement of the centroid controls the direction in which the object is moving. So, we move the bounding box of the object to a new location so that the new centroid becomes the center of this bounding box. Hence, this algorithm is called meanshift, because the mean (the centroid) is shifting. This way, we keep ourselves updated with the current location of the object.

But the problem with meanshift is that the size of the bounding box is not allowed to change. When you move the object away from the camera, the object will appear smaller to the human eye, but meanshift will not take that into account. The size of the bounding box will remain the same throughout the tracking session. Hence, we need to use CAMShift. The advantage of CAMShift is that it can adapt the size of the bounding box to the size of the object. Along with that, it can also keep track of the orientation of the object.

Let's consider the following frame, in which the object is highlighted:

Now that we have selected the object, the algorithm computes the histogram backprojection and extracts all the information. What is histogram backprojection? It's just a way of identifying how well the image fits into our histogram model. We compute the histogram model of a particular thing and then use this model to find that thing in an image. Let's move the object and see how it's getting tracked:

It looks like the object is getting tracked fairly well. Let's change the orientation and see whether the tracking is maintained:

As we can see, the bounding ellipse has changed its location as well as orientation. Let's change the perspective of the object and see whether it's still able to track it:

We're still good! The bounding ellipse has changed the aspect ratio to reflect the fact that the object looks skewed now (because of the perspective transformation). Let's look at the user interface functionality in the code:

Mat image; 
Point originPoint; 
Rect selectedRect; 
bool selectRegion = false; 
int trackingFlag = 0; 
 
// Function to track the mouse events 
void onMouse(int event, int x, int y, int, void*) 
{ 
    if(selectRegion) 
    { 
        selectedRect.x = MIN(x, originPoint.x); 
        selectedRect.y = MIN(y, originPoint.y); 
        selectedRect.width = std::abs(x - originPoint.x); 
        selectedRect.height = std::abs(y - originPoint.y); 
         
        selectedRect &= Rect(0, 0, image.cols, image.rows); 
    } 
     
    switch(event) 
    { 
        case EVENT_LBUTTONDOWN: 
            originPoint = Point(x,y); 
            selectedRect = Rect(x,y,0,0); 
            selectRegion = true; 
            break; 
             
        case EVENT_LBUTTONUP: 
            selectRegion = false; 
            if( selectedRect.width > 0 && selectedRect.height > 0 ) 
            { 
                trackingFlag = -1; 
            } 
            break; 
    } 
} 

This function basically captures the coordinates of the rectangle that was selected in the window. The user just needs to click and drag with the mouse. There are a set of built-in functions in OpenCV that help us to detect these different mouse events.

Here is the code for performing object tracking based on CAMShift:

int main(int argc, char* argv[]) 
{ 
    // Variable declaration and initialization 
    ....
    // Iterate until the user presses the Esc key 
    while(true) 
    { 
        // Capture the current frame 
        cap >> frame; 
     
        // Check if 'frame' is empty 
        if(frame.empty()) 
            break; 
         
        // Resize the frame 
        resize(frame, frame, Size(), scalingFactor, scalingFactor, INTER_AREA); 
     
        // Clone the input frame 
        frame.copyTo(image); 
     
        // Convert to HSV colorspace 
        cvtColor(image, hsvImage, COLOR_BGR2HSV);

We now have the HSV image waiting to be processed. Let's go ahead and see how we can use our thresholds to process this image:

        if(trackingFlag) 
        { 
            // Check for all the values in 'hsvimage' that are within the specified range 
            // and put the result in 'mask' 
            inRange(hsvImage, Scalar(0, minSaturation, minValue), Scalar(180, 256, maxValue), mask); 
             
            // Mix the specified channels 
            int channels[] = {0, 0}; 
            hueImage.create(hsvImage.size(), hsvImage.depth()); 
            mixChannels(&hsvImage, 1, &hueImage, 1, channels, 1); 
             
            if(trackingFlag < 0) 
            { 
                // Create images based on selected regions of interest 
                Mat roi(hueImage, selectedRect), maskroi(mask, selectedRect); 
                 
                // Compute the histogram and normalize it 
                calcHist(&roi, 1, 0, maskroi, hist, 1, &histSize, &histRanges); 
                normalize(hist, hist, 0, 255, NORM_MINMAX); 
                 
                trackingRect = selectedRect; 
                trackingFlag = 1; 
            } 

As we can see here, we use the HSV image to compute the histogram of the region. We use our thresholds to locate the required color in the HSV spectrum and then filter out the image based on that. Let's go ahead and see how we can compute the histogram backprojection:

            // Compute the histogram backprojection 
            calcBackProject(&hueImage, 1, 0, hist, backproj, &histRanges); 
            backproj &= mask; 
            RotatedRect rotatedTrackingRect = CamShift(backproj, trackingRect, TermCriteria(TermCriteria::EPS | TermCriteria::COUNT, 10, 1)); 
             
            // Check if the area of trackingRect is too small 
            if(trackingRect.area() <= 1) 
            { 
                // Use an offset value to make sure the trackingRect has a minimum size 
                int cols = backproj.cols, rows = backproj.rows; 
                int offset = MIN(rows, cols) + 1; 
                trackingRect = Rect(trackingRect.x - offset, trackingRect.y - offset, trackingRect.x + offset, trackingRect.y + offset) & Rect(0, 0, cols, rows); 
            } 

We are now ready to display the results. Using the rotated rectangle, let's draw an ellipse around our region of interest:

            // Draw the ellipse on top of the image 
            ellipse(image, rotatedTrackingRect, Scalar(0,255,0), 3, LINE_AA); 
        } 
         
        // Apply the 'negative' effect on the selected region of interest 
        if(selectRegion && selectedRect.width > 0 && selectedRect.height > 0) 
        { 
            Mat roi(image, selectedRect); 
            bitwise_not(roi, roi); 
        } 
         
        // Display the output image 
        imshow(windowName, image); 
         
        // Get the keyboard input and check if it's 'Esc' 
        // 27 -> ASCII value of 'Esc' key 
        ch = waitKey(30); 
        if (ch == 27) { 
            break; 
        } 
    } 
     
    return 1; 
}