Tracking objects of a specific color

In order to build a good object tracker, we need to understand what characteristics can be used to make our tracking robust and accurate. So, let's take a baby step in that direction and see whether we can use colorspace information to come up with a good visual tracker. One thing to keep in mind is that color information is sensitive to lighting conditions. In real-world applications, you will have to do some preprocessing to take care of that. But for now, let's assume that somebody else is doing that and we are getting clean color images.

There are many different colorspaces, and picking a good one will depend on the different applications that a user is using. While RGB is the native representation on a computer screen, it's not necessarily ideal for humans. When it comes to humans, we give names to colors more naturally based on their hue, which is why hue saturation value (HSV) is probably one of the most informative colorspaces. It closely aligns with how we perceive colors. Hue refers to the color spectrum, saturation refers to the intensity of a particular color, and value refers to the brightness of that pixel. This is actually represented in a cylindrical format. You can find a simple explanation at http://infohost.nmt.edu/tcc/help/pubs/colortheory/web/hsv.html. We can take the pixels of an image to the HSV colorspace and then use this colorspace to measure distances in this colorspace and threshold in this space thresholding to track a given object.

Consider the following frame in the video:

If you run it through the colorspace filter and track the object, you will see something like this:

As we can see here, our tracker recognizes a particular object in the video based on the color characteristics. In order to use this tracker, we need to know the color distribution of our target object. Here is the code to track a colored object, which selects only pixels that have a certain given hue. The code is well-commented, so read the explanation about each term to see what's happening:

int main(int argc, char* argv[]) 
{ 
   // Variable declarations and initializations 
     
    // Iterate until the user presses the Esc key 
    while(true) 
    { 
        // Initialize the output image before each iteration 
        outputImage = Scalar(0,0,0); 
         
        // Capture the current frame 
        cap >> frame; 
     
        // Check if 'frame' is empty 
        if(frame.empty()) 
            break; 
         
        // Resize the frame 
        resize(frame, frame, Size(), scalingFactor, scalingFactor, INTER_AREA); 
     
        // Convert to HSV colorspace 
        cvtColor(frame, hsvImage, COLOR_BGR2HSV); 
         
        // Define the range of "blue" color in HSV colorspace 
        Scalar lowerLimit = Scalar(60,100,100); 
        Scalar upperLimit = Scalar(180,255,255); 
         
        // Threshold the HSV image to get only blue color 
        inRange(hsvImage, lowerLimit, upperLimit, mask); 
         
        // Compute bitwise-AND of input image and mask 
        bitwise_and(frame, frame, outputImage, mask=mask); 
         
        // Run median filter on the output to smoothen it 
        medianBlur(outputImage, outputImage, 5); 
         
        // Display the input and output image 
        imshow("Input", frame); 
        imshow("Output", outputImage); 
         
        // Get the keyboard input and check if it's 'Esc' 
        // 30 -> wait for 30 ms 
        // 27 -> ASCII value of 'ESC' key 
        ch = waitKey(30); 
        if (ch == 27) { 
            break; 
        } 
    } 
     
    return 1; 
}