Before we start implementing a high-level tracker, we should define the type of tracking result that we want to get. For many applications, it is important to estimate how objects are posed in real, 3D space. However, our application is about image manipulation. So we care more about 2D image space. An upright, frontal view of a face should occupy a roughly rectangular region in the image. Within such a region, eyes, a nose, and a mouth should occupy rough rectangular subregions. Let's open trackers.py
and add a class containing the relevant data:
class Face(object): """Data on facial features: face, eyes, nose, mouth.""" def __init__(self): self.faceRect = None self.leftEyeRect = None self.rightEyeRect = None self.noseRect = None self.mouthRect = None
Whenever our code contains a rectangle as a property or a function argument, we will assume it is in the format (x, y, w, h)
where the unit is pixels, the upper-left corner is at (x, y)
, and the lower-right corner at (x+w, y+h)
. OpenCV sometimes uses a compatible representation but not always. So we must be careful when sending/receiving rectangles to/from OpenCV. For example, sometimes OpenCV requires the upper-left and lower-right corners as coordinate pairs.