Implementing the training script for the cat detection model

	Praline: I've never seen so many aerials in me life. The man told me, their equipment could pinpoint a purr at 400 yards and Eric, being such a happy cat, was a piece of cake.
	--—The Fish License sketch, Monty Python's Flying Circus, Episode 23 (1970)

This segment of the project uses tens of thousands of files including images, annotation files, scripts, and intermediate and final output of the training process. Let's organize all of this new material by giving our project a subfolder, cascade_training, which will ultimately have the following contents:

cascade_training/CAT_DATASET_01: The first half of the Microsoft Cat Dataset 2008. Download it from http://137.189.35.203/WebUI/CatDatabase/Data/CAT_DATASET_01.zip and unzip it.
cascade_training/CAT_DATASET_02: The second half of the Microsoft Cat Dataset 2008. Download it from http://137.189.35.203/WebUI/CatDatabase/Data/CAT_DATASET_02.zip and unzip it.
cascade_training/faces: The Caltech Faces 1999 dataset. Download it from http://www.vision.caltech.edu/Image_Datasets/faces/faces.tar and decompress it.
cascade_training/VOC2007: The VOC2007 dataset. Download it from http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2007/VOCtest_06-Nov-2007.tar and decompress it. Move the VOC2007 folder from the VOCdevkit folder.
cascade_training/describe.py: A script to preprocess and describe the positive and negative training sets. As output, it creates new images in the dataset directories (as explained in the preceding bullet points) and the text description files (as given in the following bullet points).
cascade_training/negative_description.txt: A generated text file that describes the negative training set.
cascade_training/positive_description.txt: A generated text file that describes the positive training set.
cascade_training/train.bat (in Windows) or cascade_training/train.sh (in Mac or Linux): A script to run OpenCV's cascade training tools with appropriate parameters. As input, it uses the text description files (as shown in the preceding bullet points). As output, it generates a binary description file and cascade files (as shown in the following bullet points).
cascade_training/binary_description: A generated binary file that describes the positive training set.
cascade_training/lbpcascade_frontalcatface/*.xml: Intermediate and final results of the LBP cascade training.
cascades/lbpcascade_frontalcatface.xml: A copy of the final result of the LBP cascade training in a location where our apps expect it.
cascade_training/haarcascade_frontalcatface/*.xml: Intermediate and final results of the Haar cascade training.
cascades/haarcascade_frontalcatface.xml: A copy of the final result of the Haar cascade training in a location where our apps expect it.
Note
Some of the datasets are compressed as TAR files. On Windows, we need to install a tool such as 7-Zip (http://www.7-zip.org/) to decompress this format.
For Mac and Linux, this chapter's code bundle contains a script, cascade_training/download_datasets.sh, to automate the downloading and decompression of the image datasets. The script depends on wget, which comes preinstalled with most Linux distributions. On Mac, wget can be installed via MacPorts using the following Terminal command:
```
$ sudo port install wget
```
To change the script's permissions and execute it, run the following Terminal commands from the cascade_training folder:
```
$ chmod +x download_datasets.sh
$ ./download_datasets.sh
```

Once the datasets are downloaded and decompressed to the proper locations, let's write describe.py. It needs to start with the following imports:

import cv2
import glob
import math
import sys

All our source images need some preprocessing to optimize them as training images. We need to save the preprocessed versions, so let's globally define an extension that we will use for these files:

outputImageExtension = '.out.jpg'

We need to create equalized grayscale images at several points in this script, so let's write the following helper function for this purpose:

def equalizedGray(image):
  return cv2.equalizeHist(cv2.cvtColor(
    image, cv2.cv.CV_BGR2GRAY))

def describeNegativeHelper(imagePath, output):
  outputImagePath = '%s%s' % (imagePath, outputImageExtension)
  image = cv2.imread(imagePath)
  # Save an equalized version of the image.
  cv2.imwrite(outputImagePath, equalizedGray(image))
  # Append the equalized image to the negative description.
  print >> output, outputImagePath

Now, let's write the describeNegative function that calls describeNegativeHelper. The process begins by opening a file in write mode so that we can write the negative description. Then, we iterate over all the image paths in the Caltech Faces 1999 set, which contains no cats. We will skip any paths to output images that were written on a previous call of this function. We will pass the remaining image paths, along with the newly opened negative description file to describeNegativeHelper, as follows:

def describeNegative():
  output = open('negative_description.txt', 'w')
  # Append all images from Caltech Faces 1999, since all are
  # non-cats.
  for imagePath in glob.glob('faces/*.jpg'):
    if imagePath.endswith(outputImageExtension):
      # This file is equalized, saved on a previous run.
      # Skip it.
      continue
    describeNegativeHelper(imagePath, output)

The remainder of the describeNegative function is responsible for passing relevant file paths from the VOC2007 image set to describeNegativeHelper. Some images in VOC2007 do contain cats. An annotation file, VOC2007/ImageSets/Main/cat_test.txt, lists image IDs and a flag that indicates whether or not any cats are present in the image. The flag can be -1 (no cats), 0 (one or more cats as background or secondary subjects of the image), or 1 (one or more cats as foreground or foreground subjects of the image). We will parse this annotation data and, if an image contains no cats, we will pass its path and the description file to describeNegativeHelper, as follows:

  # Append non-cat images from VOC2007.
  input = open('VOC2007/ImageSets/Main/cat_test.txt', 'r')
  while True:
    line = input.readline().rstrip()
    if not line:
      break
    imageNumber, flag = line.split()
    if int(flag) < 0:
      # There is no cat in this image.
      imagePath = 'VOC2007/JPEGImages/%s.jpg' % imageNumber
      describeNegativeHelper(imagePath, output)

Now, let's move on to helper functions in order to generate the positive description. When rotating a face to straighten it, we also need to rotate a list of coordinate pairs that represent features of the face. The following helper function accepts such a list along with a center of rotation and angle of rotation, and returns a new list of the rotated coordinate pairs:

def rotateCoords(coords, center, angleRadians):
  # Positive y is down so reverse the angle, too.
  angleRadians = -angleRadians
  xs, ys = coords[::2], coords[1::2]
  newCoords = []
  n = min(len(xs), len(ys))
  i = 0
  centerX = center[0]
  centerY = center[1]
  cosAngle = math.cos(angleRadians)
  sinAngle = math.sin(angleRadians)
  while i < n:
    xOffset = xs[i] - centerX
    yOffset = ys[i] - centerY
    newX = xOffset * cosAngle - yOffset * sinAngle + centerX
    newY = xOffset * sinAngle + yOffset * cosAngle + centerY
    newCoords += [newX, newY]
    i += 1
  return newCoords

Next, let's write a long helper function to preprocess a single positive training image. This function accepts two arguments: a list of coordinate pairs (which is named coords) and an OpenCV image. Refer back to the image of feature points on a cat face. The numbering of the points signifies their order in a line of annotation data and in coords. To begin the function, we will get the coordinates for the eyes and mouth. If the face is upside down (not an uncommon pose in playful or sleepy cats), we will swap our definitions of left and right eyes to be consistent with an upright pose. (In determining whether the face is upside down, we will rely in part on the position of the mouth relative to the eyes.) Then, we will find the angle between the eyes and we will rotate the image such that the face becomes upright. An OpenCV function called cv2.getRotationMatrix2D is used to define the rotation and another function called cv2.warpAffine is used to apply it. As a result of rotating border regions, some blank regions are introduced into the image. We can specify a fill color for these regions as an argument to cv2.warpAffine. We will use 50 percent gray, since it has the least tendency to bias the equalization of the image. Here is the implementation of this first part of the preprocessCatFace function:

def preprocessCatFace(coords, image):

  leftEyeX, leftEyeY = coords[0], coords[1]
  rightEyeX, rightEyeY = coords[2], coords[3]
  mouthX = coords[4]
  if leftEyeX > rightEyeX and leftEyeY < rightEyeY and \
    mouthX > rightEyeX:
    # The "right eye" is in the second quadrant of the face,
    # while the "left eye" is in the fourth quadrant (from the
    # viewer's perspective.) Swap the eyes' labels in order to
    # simplify the rotation logic.
    leftEyeX, rightEyeX = rightEyeX, leftEyeX
    leftEyeY, rightEyeY = rightEyeY, leftEyeY

  eyesCenter = (0.5 * (leftEyeX + rightEyeX),
    0.5 * (leftEyeY + rightEyeY))

  eyesDeltaX = rightEyeX - leftEyeX
  eyesDeltaY = rightEyeY - leftEyeY
  eyesAngleRadians = math.atan2(eyesDeltaY, eyesDeltaX)
  eyesAngleDegrees = eyesAngleRadians * 180.0 / cv2.cv.CV_PI

  # Straighten the image and fill in gray for blank borders.
  rotation = cv2.getRotationMatrix2D(
    eyesCenter, eyesAngleDegrees, 1.0)
  imageSize = image.shape[1::-1]
  straight = cv2.warpAffine(image, rotation, imageSize,
    borderValue=(128, 128, 128))

To straighten the image, we will call rotateCoords to make the feature coordinates that match the straightened image. Here is the code for this function call:

  # Straighten the coordinates of the features.
  newCoords = rotateCoords(
    coords, eyesCenter, eyesAngleRadians)

At this stage, the image and feature coordinates are transformed such that the cat's eyes are level and upright. Next, let's crop the image to eliminate most of the background and to standardize the eyes' position relative to the bounds. Arbitrarily, we will define the cropped face to be a square region, as wide as the distance between the outer base points of the cat's ears. This square is positioned such that half its area lies to the left of the midpoint between the cat's eyes and half lies to the right, 40 percent lies above, and 60 percent lies below. For an ideal frontal cat face, this crop excludes all background regions but includes the eyes, chin, and several fleshy regions: the nose, mouth, and part of the inside of the ears. We will equalize and return the cropped image. Accordingly, the implementation of preprocessCatFace proceeds as follows:

  # Make the face as wide as the space between the ear bases.
  # (The ear base positions are specified in the reference
  # coordinates.)
  w = abs(newCoords[16] - newCoords[6])
  # Make the face square.
  h = w
  # Put the center point between the eyes at (0.5, 0.4) in
  # proportion to the entire face.
  minX = eyesCenter[0] - w/2
  if minX < 0:
    w += minX
    minX = 0
  minY = eyesCenter[1] - h*2/5
  if minY < 0:
    h += minY
    minY = 0

  # Crop the face.
  crop = straight[minY:minY+h, minX:minX+w]
  # Convert the crop to equalized grayscale.
  crop = equalizedGray(crop)
  # Return the crop.
  return crop

Note

During cropping, we usually eliminate the blank border region that was introduced during rotation. However, if the cat's face was close to the border of the original image, some of the rotated gray border region might still remain.

The pair of following images is an example of input and output for the processCatFace function. First, let's look at the input:

Implementing the training script for the cat detection model

The output is displayed as follows:

To generate the positive description file, we will iterate over all the images in the Microsoft Cat Dataset 2008. For each image, we will parse the cat feature coordinates from the corresponding .cat file and will generate the straightened, cropped, and equalized image by passing the coordinates and original image to our processCatFace function. We will append each processed image's path and measurements to the positive description file. Here is the implementation:

def describePositive():
  output = open('positive_description.txt', 'w')
  dirs = ['CAT_DATASET_01/CAT_00',
    'CAT_DATASET_01/CAT_01',
    'CAT_DATASET_01/CAT_02',
    'CAT_DATASET_02/CAT_03',
    'CAT_DATASET_02/CAT_04',
    'CAT_DATASET_02/CAT_05',
    'CAT_DATASET_02/CAT_06']
  for dir in dirs:
    for imagePath in glob.glob('%s/*.jpg' % dir):
      if imagePath.endswith(outputImageExtension):
        # This file is a crop, saved on a previous run.
        # Skip it.
        continue
      # Open the '.cat' annotation file associated with this
      # image.
      input = open('%s.cat' % imagePath, 'r')
      # Read the coordinates of the cat features from the
      # file. Discard the first number, which is the number
      # of features.
      coords = [int(i) for i in input.readline().split()[1:]]
      # Read the image.
      image = cv2.imread(imagePath)
      # Straighten and crop the cat face.
      crop = preprocessCatFace(coords, image)
      if crop is None:
        print >> sys.stderr, \
         'Failed to preprocess image at %s.' % \
         imagePath
        continue
      # Save the crop.
      cropPath = '%s%s' % (imagePath, outputImageExtension)
      cv2.imwrite(cropPath, crop)
      # Append the cropped face and its bounds to the
      # positive description.
      h, w = crop.shape[:2]
      print >> output, cropPath, 1, 0, 0, w, h

Here, let's take note of the format of a positive description file. Each line contains a path to a training image followed by a series of numbers that indicate the count of positive objects in the image and the measurements (x, y, width, and height) of rectangles that contains those objects. In our case, there is always one cat face filling the entire cropped image, so we get lines such as the following, which is for a 64 x 64 image:

CAT_DATASET_02/CAT_06/00001493_005.jpg.out.jpg 1 0 0 64 64

Hypothetically, if the image had two 8 x 8 pixel cat faces in opposite corners, its line in the description file would look like this:

CAT_DATASET_02/CAT_06/00001493_005.jpg.out.jpg 2 0 0 8 8 56 56 8 8

The main function of describe.py simply calls our describeNegative and describePositive functions, as follows:

def main():
  describeNegative()
  describePositive()

if __name__ == '__main__':
  main()

Run describe.py and then feel free to have a look at the generated files, including negative_description.txt, positive_description.txt, and the cropped cat faces whose filenames follow the pattern CAT_DATASET_*/CAT_*/*.out.jpg.

Next, we will use two command-line tools that come with OpenCV. We will refer to them as <opencv_createsamples> and <opencv_traincascade> respectively. They are responsible for converting the positive description to a binary format and generating the Haar cascade in an XML format. On Windows, these executables are named opencv_createsamples.exe and opencv_traincascade.exe. They are located in the following directories, one of which we should add to the system's Path variable:

<opencv_unzip_destination>\build\x64\vc10\bin (64-bit, requires Visual C++ 2010 Runtime Redistributable)
<opencv_unzip_destination>\build\x64\vc11\bin (64-bit, requires Visual C++ 2011 Runtime Redistributable)
<opencv_unzip_destination>\build\x86\vc10\bin (32-bit, requires Visual C++ 2010 Runtime Redistributable)
<opencv_unzip_destination>\build\x86\vc11\bin (32-bit, requires Visual C++ 2011 Runtime Redistributable)

On Mac or Linux, the executables are named opencv_createsamples and opencv_traincascade. They are located in one of the following directories, which should already be in the system's PATH variable:

Mac with MacPorts: /opt/local/bin
Mac with Homebrew: /opt/local/bin or opt/local/sbin
Debian Wheezy and its derivatives with my custom installation script: /usr/local/bin
Other Linux setups: /usr/bin or /usr/local/bin

Many flags can be used to provide arguments to <opencv_createsamples> and <opencv_traincascade>, as described in the official documentation at http://docs.opencv.org/doc/user_guide/ug_traincascade.html. We will use the following flags and values:

vec: This consists of the path to a binary description of the positive training images. This file is generated by <opencv_createsamples>.
info: The path to a text description of the positive training images. We generated this file using describe.py.
bg: The path to a text description of the negative training images. We generated this file using describe.py.
num: The number of positive training images in info.
numStages: The number of stages in the cascade. As we discussed earlier that conceptualizing Haar cascades and LBPH, each stage is a test that is applied to a region in an image. If the region passes all tests, it is classified as a frontal cat face (or whatever class of object the positive training set represents). We will use 15 for our project.
numPos: The number of positive training images used in each stage. It should be significantly smaller than num. (Otherwise, the trainer will fail, complaining that it has run out of new images to use in new stages.) We will use 90 percent of num.
numNeg: The number of negative training images used in each stage. We will use 90 percent of the number of negative training images in bg.
minHitRate: The minimum proportion of training images that each stage must classify correctly. A higher proportion implies a longer training time but a better fit between the model and the training data. (A better fit is normally a good thing, though it is possible to overfit such that the model does not make correct extrapolations beyond the training data.) We will use 0.999.
featureType: The type of features used are either HAAR (the default) or LBP. As discussed in the preceding points, Haar cascades tend to be more reliable but are much slower to train and somewhat slower at runtime.
mode: The subset of Haar features used. (For LBP, this flag has no effect.) The valid options are BASIC (the default), CORE, and ALL. The CORE option makes the model slower to train and run, but the benefit to this is to make the model sensitive to little dots and thick lines. The ALL option goes further, making the model even slower to train and run but adds sensitivity to the diagonal patterns (whereas BASIC and CORE are only sensitive to horizontal and vertical patterns). The ALL option has nothing to do with detecting nonupright subjects; rather, it relates to detecting subjects that contain diagonal patterns. For example, a cat's whiskers and ears might qualify as diagonal patterns.

Let's write a shell script to run <opencv_createsamples> and <opencv_traincascade> with the appropriate flags and to copy the resulting Haar cascade to the path where Interactive Cat Face Recognizer expects it. On Windows, let's call our train.bat script and implement it as follows:

set vec=binary_description
set info=positive_description.txt
set bg=negative_description.txt

REM Uncomment the next 4 variables for LBP training.
REM set featureType=LBP
REM set data=lbpcascade_frontalcatface\
REM set dst=..\cascades\lbpcascade_frontalcatface.xml
REM set mode=BASIC

REM Uncomment the next 4 variables for Haar training with basic
REM features.
set featureType=HAAR
set data=haarcascade_frontalcatface\
set dst=..\cascades\haarcascade_frontalcatface.xml
set mode=BASIC

REM Uncomment the next 4 variables for Haar training with
REM extended features.
REM set featureType=HAAR
REM set data=haarcascade_frontalcatface_extended\
REM set dst=..\cascades\haarcascade_frontalcatface_extended.xml
REM set mode=ALL

REM Set numPosTotal to be the line count of info.
for /f %c in ('find /c /v "" ^< "%info%"') do set numPosTotal=%c

REM Set numNegTotal to be the line count of bg.
for /f %c in ('find /c /v "" ^< "%bg%"') do set numNegTotal=%c

set /a numPosPerStage=%numPosTotal%*9/10
set /a numNegPerStage=%numNegTotal%*9/10
set numStages=15
set minHitRate=0.999

REM Ensure that the data directory exists and is empty.
if not exist "%data%" (mkdir "%data%") else del /f /q "%data%\*.xml"

opencv_createsamples -vec "%vec%" -info "%info%" -bg "%bg%" ^
  -num "%numPosTotal%"
opencv_traincascade -data "%data%" -vec "%vec%" -bg "%bg%" ^
  -numPos "%numPosPerStage%" -numNeg "%numNegPerStage%" ^
  -numStages "%numStages%" -minHitRate "%minHitRate%" ^
  -featureType "%featureType%" -mode "%mode%"

cp "%data%\cascade.xml" "%dst%"

On Mac or Linux, let's call our train.sh script instead and implement it as follows:

#!/bin/sh

vec=binary_description
info=positive_description.txt
bg=negative_description.txt

# Uncomment the next 4 variables for LBP training.
#featureType=LBP
#data=lbpcascade_frontalcatface/
#dst=../cascades/lbpcascade_frontalcatface.xml
#mode=BASIC

# Uncomment the next 4 variables for Haar training with basic
# features.
featureType=HAAR
data=haarcascade_frontalcatface/
dst=../cascades/haarcascade_frontalcatface.xml
mode=BASIC

# Uncomment the next 4 variables for Haar training with
# extended features.
#featureType=HAAR
#data=haarcascade_frontalcatface_extended/
#dst=../cascades/haarcascade_frontalcatface_extended.xml
#mode=ALL

# Set numPosTotal to be the line count of info.
numPosTotal=`wc -l < $info`

# Set numNegTotal to be the line count of bg.
numNegTotal=`wc -l < $bg`

numPosPerStage=$(($numPosTotal*9/10))
numNegPerStage=$(($numNegTotal*9/10))
numStages=15
minHitRate=0.999

# Ensure that the data directory exists and is empty.
if [ ! -d "$data" ]; then
  mkdir "$data"
else
  rm "$data/*.xml"
fi

opencv_createsamples -vec "$vec" -info "$info" -bg "$bg" \
  -num "$numPosTotal"
opencv_traincascade -data "$data" -vec "$vec" -bg "$bg" \
  -numPos "$numPosPerStage" -numNeg "$numNegPerStage" \
  -numStages "$numStages" -minHitRate "$minHitRate" \
  -featureType "$featureType" –mode "$mode"

cp "$data/cascade.xml" "$dst"

The preceding versions of the training script are configured to use basic Haar features and will take a long, long time to run, perhaps more than a day. By commenting out the variables related to a basic Haar configuration and uncommenting the variables related to an LBP configuration, we can cut the training time down to several minutes. As a third alternative, variables for an extended Haar configuration (sensitive to diagonal patterns) are also present but are currently commented out.

When the training is done, feel free to have a look at the generated files, including the following:

For basic Haar features, cascades/haarcascade_frontalcatface.xml and cascade_training/haarcascade_frontalcatface/*
For extended Haar features, cascades/haarcascade_frontalcatface_extended.xml and cascade_training/haarcascade_frontalcatface_extended/*
For LBP, cascades/lbpcascade_frontalcatface.xml and cascade_training/lbpcascade_frontalcatface/*

Finally, let's run InteractiveCatFaceRecognizer.py to test our cascade!

Remember that our detector is designed for frontal upright cat faces. The cat should be facing the camera and might need some incentive to hold that pose. For example, you could ask the cat to settle on a blanket or in your lap, and you could pat or comb the cat. Refer to the following screenshot of my colleague, Chancellor Josephine "Little Jo" Antoinette Puddingcat, GRL (Grand Rock of Lambda), sitting for a test.

If you do not have a cat (or even a person) who is willing to participate, then you can simply print a few images of a given cat (or person) from the Web. Use heavy matte paper and hold the print so that it faces the camera. Use prints of some images to train the recognizer and prints of other images to test it.

Our detector is pretty good at finding frontal cat faces. However, I encourage you to experiment further, make it better, and share your results! The current version sometimes mistakes the center of a frontal human face for a frontal cat face. Perhaps we should have used more databases of human faces as negative training images. Alternatively, if we had used faces of several mammal species as positive training images, could we have created a more general mammal face detector? Let me know what you discover!