In order to build a good object tracker, we need to understand what characteristics can be used to make our tracking robust and accurate. So, let's take a baby step in that direction and see whether we can use colorspace information to come up with a good visual tracker. One thing to keep in mind is that color information is sensitive to lighting conditions. In real-world applications, you will have to do some preprocessing to take care of that. But for now, let's assume that somebody else is doing that and we are getting clean color images.
There are many different colorspaces, and picking a good one will depend on the different applications that a user is using. While RGB is the native representation on a computer screen, it's not necessarily ideal for humans. When it comes to humans, we give names to colors more naturally based on their hue, which is why hue saturation value (HSV) is probably one of the most informative colorspaces. It closely aligns with how we perceive colors. Hue refers to the color spectrum, saturation refers to the intensity of a particular color, and value refers to the brightness of that pixel. This is actually represented in a cylindrical format. You can find a simple explanation at http://infohost.nmt.edu/tcc/help/pubs/colortheory/web/hsv.html. We can take the pixels of an image to the HSV colorspace and then use this colorspace to measure distances in this colorspace and threshold in this space thresholding to track a given object.
Consider the following frame in the video:
If you run it through the colorspace filter and track the object, you will see something like this:
As we can see here, our tracker recognizes a particular object in the video based on the color characteristics. In order to use this tracker, we need to know the color distribution of our target object. Here is the code to track a colored object, which selects only pixels that have a certain given hue. The code is well-commented, so read the explanation about each term to see what's happening:
int main(int argc, char* argv[]) { // Variable declarations and initializations // Iterate until the user presses the Esc key while(true) { // Initialize the output image before each iteration outputImage = Scalar(0,0,0); // Capture the current frame cap >> frame; // Check if 'frame' is empty if(frame.empty()) break; // Resize the frame resize(frame, frame, Size(), scalingFactor, scalingFactor, INTER_AREA); // Convert to HSV colorspace cvtColor(frame, hsvImage, COLOR_BGR2HSV); // Define the range of "blue" color in HSV colorspace Scalar lowerLimit = Scalar(60,100,100); Scalar upperLimit = Scalar(180,255,255); // Threshold the HSV image to get only blue color inRange(hsvImage, lowerLimit, upperLimit, mask); // Compute bitwise-AND of input image and mask bitwise_and(frame, frame, outputImage, mask=mask); // Run median filter on the output to smoothen it medianBlur(outputImage, outputImage, 5); // Display the input and output image imshow("Input", frame); imshow("Output", outputImage); // Get the keyboard input and check if it's 'Esc' // 30 -> wait for 30 ms // 27 -> ASCII value of 'ESC' key ch = waitKey(30); if (ch == 27) { break; } } return 1; }