Object tracking tries to estimate the trajectory of the target throughout the video sequence where only the initial location of a target is known. This task is really challenging on account of several factors, such as appearance variations, occlusions, fast motion, motion blur, and scale variations.
In this sense, discriminative correlation filter (DCF)-based visual trackers provide state-of-the-art performance. Additionally, these trackers are computationally efficient, which is critical in real-time applications. Indeed, the state-of-the-art performance of DCF-based trackers can be seen in the results of the visual object tracking (VOT) 2014 Challenge. In the VOT2014 Challenge, the top three trackers are based on correlation filters. VOT2014 evaluated 38 trackers (33 submitted trackers and 5 baselines from the VOT2014 committee: http://www.votchallenge.net/vot2014/download/vot_2014_presentation.pdf). Therefore, DCF trackers are currently a very popular method of choice for bounding box-based tracking.
The dlib library implements a DCF-based tracker, which is easy to use for object tracking. In this section, we will see how to use this tracker for both face tracking and for tracking an arbitrary object selected by the user. In the literature, this method is also known as Discriminative Scale Space Tracker (DSST). The only required input (other than the raw video) is a bounding box on the first frame (the initial location of a target) and, then, the tracker automatically predicts the trajectory of the target.