Implementing the training script for the cat detection model

 

Praline: I've never seen so many aerials in me life. The man told me, their equipment could pinpoint a purr at 400 yards and Eric, being such a happy cat, was a piece of cake.

 
 --—The Fish License sketch, Monty Python's Flying Circus, Episode 23 (1970)

This segment of the project uses tens of thousands of files including images, annotation files, scripts, and intermediate and final output of the training process. Let's organize all of this new material by giving our project a subfolder, cascade_training, which will ultimately have the following contents:

Once the datasets are downloaded and decompressed to the proper locations, let's write describe.py. It needs to start with the following imports:

import cv2
import glob
import math
import sys

All our source images need some preprocessing to optimize them as training images. We need to save the preprocessed versions, so let's globally define an extension that we will use for these files:

outputImageExtension = '.out.jpg'

We need to create equalized grayscale images at several points in this script, so let's write the following helper function for this purpose:

def equalizedGray(image):
  return cv2.equalizeHist(cv2.cvtColor(
    image, cv2.cv.CV_BGR2GRAY))

Similarly, we need to append to the negative description file at more than one point in the script. Each line in the negative description is just an image's path. Let's add the following helper method, which accepts an image path and a file object for the negative description, loads the image and saves an equalized version, and appends the equalized version's path to the description file:

def describeNegativeHelper(imagePath, output):
  outputImagePath = '%s%s' % (imagePath, outputImageExtension)
  image = cv2.imread(imagePath)
  # Save an equalized version of the image.
  cv2.imwrite(outputImagePath, equalizedGray(image))
  # Append the equalized image to the negative description.
  print >> output, outputImagePath

Now, let's write the describeNegative function that calls describeNegativeHelper. The process begins by opening a file in write mode so that we can write the negative description. Then, we iterate over all the image paths in the Caltech Faces 1999 set, which contains no cats. We will skip any paths to output images that were written on a previous call of this function. We will pass the remaining image paths, along with the newly opened negative description file to describeNegativeHelper, as follows:

def describeNegative():
  output = open('negative_description.txt', 'w')
  # Append all images from Caltech Faces 1999, since all are
  # non-cats.
  for imagePath in glob.glob('faces/*.jpg'):
    if imagePath.endswith(outputImageExtension):
      # This file is equalized, saved on a previous run.
      # Skip it.
      continue
    describeNegativeHelper(imagePath, output)

The remainder of the describeNegative function is responsible for passing relevant file paths from the VOC2007 image set to describeNegativeHelper. Some images in VOC2007 do contain cats. An annotation file, VOC2007/ImageSets/Main/cat_test.txt, lists image IDs and a flag that indicates whether or not any cats are present in the image. The flag can be -1 (no cats), 0 (one or more cats as background or secondary subjects of the image), or 1 (one or more cats as foreground or foreground subjects of the image). We will parse this annotation data and, if an image contains no cats, we will pass its path and the description file to describeNegativeHelper, as follows:

  # Append non-cat images from VOC2007.
  input = open('VOC2007/ImageSets/Main/cat_test.txt', 'r')
  while True:
    line = input.readline().rstrip()
    if not line:
      break
    imageNumber, flag = line.split()
    if int(flag) < 0:
      # There is no cat in this image.
      imagePath = 'VOC2007/JPEGImages/%s.jpg' % imageNumber
      describeNegativeHelper(imagePath, output)

Now, let's move on to helper functions in order to generate the positive description. When rotating a face to straighten it, we also need to rotate a list of coordinate pairs that represent features of the face. The following helper function accepts such a list along with a center of rotation and angle of rotation, and returns a new list of the rotated coordinate pairs:

def rotateCoords(coords, center, angleRadians):
  # Positive y is down so reverse the angle, too.
  angleRadians = -angleRadians
  xs, ys = coords[::2], coords[1::2]
  newCoords = []
  n = min(len(xs), len(ys))
  i = 0
  centerX = center[0]
  centerY = center[1]
  cosAngle = math.cos(angleRadians)
  sinAngle = math.sin(angleRadians)
  while i < n:
    xOffset = xs[i] - centerX
    yOffset = ys[i] - centerY
    newX = xOffset * cosAngle - yOffset * sinAngle + centerX
    newY = xOffset * sinAngle + yOffset * cosAngle + centerY
    newCoords += [newX, newY]
    i += 1
  return newCoords

Next, let's write a long helper function to preprocess a single positive training image. This function accepts two arguments: a list of coordinate pairs (which is named coords) and an OpenCV image. Refer back to the image of feature points on a cat face. The numbering of the points signifies their order in a line of annotation data and in coords. To begin the function, we will get the coordinates for the eyes and mouth. If the face is upside down (not an uncommon pose in playful or sleepy cats), we will swap our definitions of left and right eyes to be consistent with an upright pose. (In determining whether the face is upside down, we will rely in part on the position of the mouth relative to the eyes.) Then, we will find the angle between the eyes and we will rotate the image such that the face becomes upright. An OpenCV function called cv2.getRotationMatrix2D is used to define the rotation and another function called cv2.warpAffine is used to apply it. As a result of rotating border regions, some blank regions are introduced into the image. We can specify a fill color for these regions as an argument to cv2.warpAffine. We will use 50 percent gray, since it has the least tendency to bias the equalization of the image. Here is the implementation of this first part of the preprocessCatFace function:

def preprocessCatFace(coords, image):

  leftEyeX, leftEyeY = coords[0], coords[1]
  rightEyeX, rightEyeY = coords[2], coords[3]
  mouthX = coords[4]
  if leftEyeX > rightEyeX and leftEyeY < rightEyeY and \
    mouthX > rightEyeX:
    # The "right eye" is in the second quadrant of the face,
    # while the "left eye" is in the fourth quadrant (from the
    # viewer's perspective.) Swap the eyes' labels in order to
    # simplify the rotation logic.
    leftEyeX, rightEyeX = rightEyeX, leftEyeX
    leftEyeY, rightEyeY = rightEyeY, leftEyeY

  eyesCenter = (0.5 * (leftEyeX + rightEyeX),
    0.5 * (leftEyeY + rightEyeY))

  eyesDeltaX = rightEyeX - leftEyeX
  eyesDeltaY = rightEyeY - leftEyeY
  eyesAngleRadians = math.atan2(eyesDeltaY, eyesDeltaX)
  eyesAngleDegrees = eyesAngleRadians * 180.0 / cv2.cv.CV_PI

  # Straighten the image and fill in gray for blank borders.
  rotation = cv2.getRotationMatrix2D(
    eyesCenter, eyesAngleDegrees, 1.0)
  imageSize = image.shape[1::-1]
  straight = cv2.warpAffine(image, rotation, imageSize,
    borderValue=(128, 128, 128))

To straighten the image, we will call rotateCoords to make the feature coordinates that match the straightened image. Here is the code for this function call:

  # Straighten the coordinates of the features.
  newCoords = rotateCoords(
    coords, eyesCenter, eyesAngleRadians)

At this stage, the image and feature coordinates are transformed such that the cat's eyes are level and upright. Next, let's crop the image to eliminate most of the background and to standardize the eyes' position relative to the bounds. Arbitrarily, we will define the cropped face to be a square region, as wide as the distance between the outer base points of the cat's ears. This square is positioned such that half its area lies to the left of the midpoint between the cat's eyes and half lies to the right, 40 percent lies above, and 60 percent lies below. For an ideal frontal cat face, this crop excludes all background regions but includes the eyes, chin, and several fleshy regions: the nose, mouth, and part of the inside of the ears. We will equalize and return the cropped image. Accordingly, the implementation of preprocessCatFace proceeds as follows:

  # Make the face as wide as the space between the ear bases.
  # (The ear base positions are specified in the reference
  # coordinates.)
  w = abs(newCoords[16] - newCoords[6])
  # Make the face square.
  h = w
  # Put the center point between the eyes at (0.5, 0.4) in
  # proportion to the entire face.
  minX = eyesCenter[0] - w/2
  if minX < 0:
    w += minX
    minX = 0
  minY = eyesCenter[1] - h*2/5
  if minY < 0:
    h += minY
    minY = 0

  # Crop the face.
  crop = straight[minY:minY+h, minX:minX+w]
  # Convert the crop to equalized grayscale.
  crop = equalizedGray(crop)
  # Return the crop.
  return crop

The pair of following images is an example of input and output for the processCatFace function. First, let's look at the input:

Implementing the training script for the cat detection model

The output is displayed as follows:

Implementing the training script for the cat detection model

To generate the positive description file, we will iterate over all the images in the Microsoft Cat Dataset 2008. For each image, we will parse the cat feature coordinates from the corresponding .cat file and will generate the straightened, cropped, and equalized image by passing the coordinates and original image to our processCatFace function. We will append each processed image's path and measurements to the positive description file. Here is the implementation:

def describePositive():
  output = open('positive_description.txt', 'w')
  dirs = ['CAT_DATASET_01/CAT_00',
    'CAT_DATASET_01/CAT_01',
    'CAT_DATASET_01/CAT_02',
    'CAT_DATASET_02/CAT_03',
    'CAT_DATASET_02/CAT_04',
    'CAT_DATASET_02/CAT_05',
    'CAT_DATASET_02/CAT_06']
  for dir in dirs:
    for imagePath in glob.glob('%s/*.jpg' % dir):
      if imagePath.endswith(outputImageExtension):
        # This file is a crop, saved on a previous run.
        # Skip it.
        continue
      # Open the '.cat' annotation file associated with this
      # image.
      input = open('%s.cat' % imagePath, 'r')
      # Read the coordinates of the cat features from the
      # file. Discard the first number, which is the number
      # of features.
      coords = [int(i) for i in input.readline().split()[1:]]
      # Read the image.
      image = cv2.imread(imagePath)
      # Straighten and crop the cat face.
      crop = preprocessCatFace(coords, image)
      if crop is None:
        print >> sys.stderr, \
         'Failed to preprocess image at %s.' % \
         imagePath
        continue
      # Save the crop.
      cropPath = '%s%s' % (imagePath, outputImageExtension)
      cv2.imwrite(cropPath, crop)
      # Append the cropped face and its bounds to the
      # positive description.
      h, w = crop.shape[:2]
      print >> output, cropPath, 1, 0, 0, w, h

Here, let's take note of the format of a positive description file. Each line contains a path to a training image followed by a series of numbers that indicate the count of positive objects in the image and the measurements (x, y, width, and height) of rectangles that contains those objects. In our case, there is always one cat face filling the entire cropped image, so we get lines such as the following, which is for a 64 x 64 image:

CAT_DATASET_02/CAT_06/00001493_005.jpg.out.jpg 1 0 0 64 64

Hypothetically, if the image had two 8 x 8 pixel cat faces in opposite corners, its line in the description file would look like this:

CAT_DATASET_02/CAT_06/00001493_005.jpg.out.jpg 2 0 0 8 8 56 56 8 8

The main function of describe.py simply calls our describeNegative and describePositive functions, as follows:

def main():
  describeNegative()
  describePositive()

if __name__ == '__main__':
  main()

Run describe.py and then feel free to have a look at the generated files, including negative_description.txt, positive_description.txt, and the cropped cat faces whose filenames follow the pattern CAT_DATASET_*/CAT_*/*.out.jpg.

Next, we will use two command-line tools that come with OpenCV. We will refer to them as <opencv_createsamples> and <opencv_traincascade> respectively. They are responsible for converting the positive description to a binary format and generating the Haar cascade in an XML format. On Windows, these executables are named opencv_createsamples.exe and opencv_traincascade.exe. They are located in the following directories, one of which we should add to the system's Path variable:

On Mac or Linux, the executables are named opencv_createsamples and opencv_traincascade. They are located in one of the following directories, which should already be in the system's PATH variable:

Many flags can be used to provide arguments to <opencv_createsamples> and <opencv_traincascade>, as described in the official documentation at http://docs.opencv.org/doc/user_guide/ug_traincascade.html. We will use the following flags and values:

Let's write a shell script to run <opencv_createsamples> and <opencv_traincascade> with the appropriate flags and to copy the resulting Haar cascade to the path where Interactive Cat Face Recognizer expects it. On Windows, let's call our train.bat script and implement it as follows:

set vec=binary_description
set info=positive_description.txt
set bg=negative_description.txt

REM Uncomment the next 4 variables for LBP training.
REM set featureType=LBP
REM set data=lbpcascade_frontalcatface\
REM set dst=..\cascades\lbpcascade_frontalcatface.xml
REM set mode=BASIC

REM Uncomment the next 4 variables for Haar training with basic
REM features.
set featureType=HAAR
set data=haarcascade_frontalcatface\
set dst=..\cascades\haarcascade_frontalcatface.xml
set mode=BASIC

REM Uncomment the next 4 variables for Haar training with
REM extended features.
REM set featureType=HAAR
REM set data=haarcascade_frontalcatface_extended\
REM set dst=..\cascades\haarcascade_frontalcatface_extended.xml
REM set mode=ALL

REM Set numPosTotal to be the line count of info.
for /f %c in ('find /c /v "" ^< "%info%"') do set numPosTotal=%c

REM Set numNegTotal to be the line count of bg.
for /f %c in ('find /c /v "" ^< "%bg%"') do set numNegTotal=%c

set /a numPosPerStage=%numPosTotal%*9/10
set /a numNegPerStage=%numNegTotal%*9/10
set numStages=15
set minHitRate=0.999

REM Ensure that the data directory exists and is empty.
if not exist "%data%" (mkdir "%data%") else del /f /q "%data%\*.xml"

opencv_createsamples -vec "%vec%" -info "%info%" -bg "%bg%" ^
  -num "%numPosTotal%"
opencv_traincascade -data "%data%" -vec "%vec%" -bg "%bg%" ^
  -numPos "%numPosPerStage%" -numNeg "%numNegPerStage%" ^
  -numStages "%numStages%" -minHitRate "%minHitRate%" ^
  -featureType "%featureType%" -mode "%mode%"

cp "%data%\cascade.xml" "%dst%"

On Mac or Linux, let's call our train.sh script instead and implement it as follows:

#!/bin/sh

vec=binary_description
info=positive_description.txt
bg=negative_description.txt

# Uncomment the next 4 variables for LBP training.
#featureType=LBP
#data=lbpcascade_frontalcatface/
#dst=../cascades/lbpcascade_frontalcatface.xml
#mode=BASIC

# Uncomment the next 4 variables for Haar training with basic
# features.
featureType=HAAR
data=haarcascade_frontalcatface/
dst=../cascades/haarcascade_frontalcatface.xml
mode=BASIC

# Uncomment the next 4 variables for Haar training with
# extended features.
#featureType=HAAR
#data=haarcascade_frontalcatface_extended/
#dst=../cascades/haarcascade_frontalcatface_extended.xml
#mode=ALL

# Set numPosTotal to be the line count of info.
numPosTotal=`wc -l < $info`

# Set numNegTotal to be the line count of bg.
numNegTotal=`wc -l < $bg`

numPosPerStage=$(($numPosTotal*9/10))
numNegPerStage=$(($numNegTotal*9/10))
numStages=15
minHitRate=0.999

# Ensure that the data directory exists and is empty.
if [ ! -d "$data" ]; then
  mkdir "$data"
else
  rm "$data/*.xml"
fi

opencv_createsamples -vec "$vec" -info "$info" -bg "$bg" \
  -num "$numPosTotal"
opencv_traincascade -data "$data" -vec "$vec" -bg "$bg" \
  -numPos "$numPosPerStage" -numNeg "$numNegPerStage" \
  -numStages "$numStages" -minHitRate "$minHitRate" \
  -featureType "$featureType" –mode "$mode"

cp "$data/cascade.xml" "$dst"

The preceding versions of the training script are configured to use basic Haar features and will take a long, long time to run, perhaps more than a day. By commenting out the variables related to a basic Haar configuration and uncommenting the variables related to an LBP configuration, we can cut the training time down to several minutes. As a third alternative, variables for an extended Haar configuration (sensitive to diagonal patterns) are also present but are currently commented out.

When the training is done, feel free to have a look at the generated files, including the following:

Finally, let's run InteractiveCatFaceRecognizer.py to test our cascade!

Remember that our detector is designed for frontal upright cat faces. The cat should be facing the camera and might need some incentive to hold that pose. For example, you could ask the cat to settle on a blanket or in your lap, and you could pat or comb the cat. Refer to the following screenshot of my colleague, Chancellor Josephine "Little Jo" Antoinette Puddingcat, GRL (Grand Rock of Lambda), sitting for a test.

If you do not have a cat (or even a person) who is willing to participate, then you can simply print a few images of a given cat (or person) from the Web. Use heavy matte paper and hold the print so that it faces the camera. Use prints of some images to train the recognizer and prints of other images to test it.

Implementing the training script for the cat detection model

Our detector is pretty good at finding frontal cat faces. However, I encourage you to experiment further, make it better, and share your results! The current version sometimes mistakes the center of a frontal human face for a frontal cat face. Perhaps we should have used more databases of human faces as negative training images. Alternatively, if we had used faces of several mammal species as positive training images, could we have created a more general mammal face detector? Let me know what you discover!