Praline: I've never seen so many aerials in me life. The man told me, their equipment could pinpoint a purr at 400 yards and Eric, being such a happy cat, was a piece of cake. | ||
--—The Fish License sketch, Monty Python's Flying Circus, Episode 23 (1970) |
This segment of the project uses tens of thousands of files including images, annotation files, scripts, and intermediate and final output of the training process. Let's organize all of this new material by giving our project a subfolder, cascade_training
, which will ultimately have the following contents:
cascade_training/CAT_DATASET_01
: The first half of the Microsoft Cat Dataset 2008. Download it from http://137.189.35.203/WebUI/CatDatabase/Data/CAT_DATASET_01.zip and unzip it.cascade_training/CAT_DATASET_02
: The second half of the Microsoft Cat Dataset 2008. Download it from http://137.189.35.203/WebUI/CatDatabase/Data/CAT_DATASET_02.zip and unzip it.cascade_training/faces
: The Caltech Faces 1999 dataset. Download it from http://www.vision.caltech.edu/Image_Datasets/faces/faces.tar and decompress it.cascade_training/VOC2007
: The VOC2007 dataset. Download it from http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2007/VOCtest_06-Nov-2007.tar and decompress it. Move the VOC2007
folder from the VOCdevkit
folder.cascade_training/describe.py
: A script to preprocess and describe the positive and negative training sets. As output, it creates new images in the dataset directories (as explained in the preceding bullet points) and the text description files (as given in the following bullet points).cascade_training/negative_description.txt
: A generated text file that describes the negative training set.cascade_training/positive_description.txt
: A generated text file that describes the positive training set.cascade_training/train.bat
(in Windows) or cascade_training/train.sh
(in Mac or Linux): A script to run OpenCV's cascade training tools with appropriate parameters. As input, it uses the text description files (as shown in the preceding bullet points). As output, it generates a binary description file and cascade files (as shown in the following bullet points).cascade_training/binary_description
: A generated binary file that describes the positive training set.cascade_training/lbpcascade_frontalcatface/*.xml
: Intermediate and final results of the LBP cascade training.cascades/lbpcascade_frontalcatface.xml
: A copy of the final result of the LBP cascade training in a location where our apps expect it.cascade_training/haarcascade_frontalcatface/*.xml
: Intermediate and final results of the Haar cascade training.cascades/haarcascade_frontalcatface.xml
: A copy of the final result of the Haar cascade training in a location where our apps expect it.Some of the datasets are compressed as TAR files. On Windows, we need to install a tool such as 7-Zip (http://www.7-zip.org/) to decompress this format.
For Mac and Linux, this chapter's code bundle contains a script, cascade_training/download_datasets.sh
, to automate the downloading and decompression of the image datasets. The script depends on wget
, which comes preinstalled with most Linux distributions. On Mac, wget
can be installed via MacPorts using the following Terminal command:
$ sudo port install wget
To change the script's permissions and execute it, run the following Terminal commands from the cascade_training
folder:
$ chmod +x download_datasets.sh $ ./download_datasets.sh
Once the datasets are downloaded and decompressed to the proper locations, let's write describe.py
. It needs to start with the following imports:
import cv2 import glob import math import sys
All our source images need some preprocessing to optimize them as training images. We need to save the preprocessed versions, so let's globally define an extension that we will use for these files:
outputImageExtension = '.out.jpg'
We need to create equalized grayscale images at several points in this script, so let's write the following helper function for this purpose:
def equalizedGray(image): return cv2.equalizeHist(cv2.cvtColor( image, cv2.cv.CV_BGR2GRAY))
Similarly, we need to append to the negative description file at more than one point in the script. Each line in the negative description is just an image's path. Let's add the following helper method, which accepts an image path and a file object for the negative description, loads the image and saves an equalized version, and appends the equalized version's path to the description file:
def describeNegativeHelper(imagePath, output): outputImagePath = '%s%s' % (imagePath, outputImageExtension) image = cv2.imread(imagePath) # Save an equalized version of the image. cv2.imwrite(outputImagePath, equalizedGray(image)) # Append the equalized image to the negative description. print >> output, outputImagePath
Now, let's write the describeNegative
function that calls describeNegativeHelper
. The process begins by opening a file in write mode so that we can write the negative description. Then, we iterate over all the image paths in the Caltech Faces 1999 set, which contains no cats. We will skip any paths to output images that were written on a previous call of this function. We will pass the remaining image paths, along with the newly opened negative description file to describeNegativeHelper
, as follows:
def describeNegative(): output = open('negative_description.txt', 'w') # Append all images from Caltech Faces 1999, since all are # non-cats. for imagePath in glob.glob('faces/*.jpg'): if imagePath.endswith(outputImageExtension): # This file is equalized, saved on a previous run. # Skip it. continue describeNegativeHelper(imagePath, output)
The remainder of the describeNegative
function is responsible for passing relevant file paths from the VOC2007 image set to describeNegativeHelper
. Some images in VOC2007 do contain cats. An annotation file, VOC2007/ImageSets/Main/cat_test.txt
, lists image IDs and a flag that indicates whether or not any cats are present in the image. The flag can be -1 (no cats), 0 (one or more cats as background or secondary subjects of the image), or 1 (one or more cats as foreground or foreground subjects of the image). We will parse this annotation data and, if an image contains no cats, we will pass its path and the description file to describeNegativeHelper
, as follows:
# Append non-cat images from VOC2007. input = open('VOC2007/ImageSets/Main/cat_test.txt', 'r') while True: line = input.readline().rstrip() if not line: break imageNumber, flag = line.split() if int(flag) < 0: # There is no cat in this image. imagePath = 'VOC2007/JPEGImages/%s.jpg' % imageNumber describeNegativeHelper(imagePath, output)
Now, let's move on to helper functions in order to generate the positive description. When rotating a face to straighten it, we also need to rotate a list of coordinate pairs that represent features of the face. The following helper function accepts such a list along with a center of rotation and angle of rotation, and returns a new list of the rotated coordinate pairs:
def rotateCoords(coords, center, angleRadians): # Positive y is down so reverse the angle, too. angleRadians = -angleRadians xs, ys = coords[::2], coords[1::2] newCoords = [] n = min(len(xs), len(ys)) i = 0 centerX = center[0] centerY = center[1] cosAngle = math.cos(angleRadians) sinAngle = math.sin(angleRadians) while i < n: xOffset = xs[i] - centerX yOffset = ys[i] - centerY newX = xOffset * cosAngle - yOffset * sinAngle + centerX newY = xOffset * sinAngle + yOffset * cosAngle + centerY newCoords += [newX, newY] i += 1 return newCoords
Next, let's write a long helper function to preprocess a single positive training image. This function accepts two arguments: a list of coordinate pairs (which is named coords
) and an OpenCV image. Refer back to the image of feature points on a cat face. The numbering of the points signifies their order in a line of annotation data and in coords
. To begin the function, we will get the coordinates for the eyes and mouth. If the face is upside down (not an uncommon pose in playful or sleepy cats), we will swap our definitions of left and right eyes to be consistent with an upright pose. (In determining whether the face is upside down, we will rely in part on the position of the mouth relative to the eyes.) Then, we will find the angle between the eyes and we will rotate the image such that the face becomes upright. An OpenCV function called cv2.getRotationMatrix2D
is used to define the rotation and another function called cv2.warpAffine
is used to apply it. As a result of rotating border regions, some blank regions are introduced into the image. We can specify a fill color for these regions as an argument to cv2.warpAffine
. We will use 50 percent gray, since it has the least tendency to bias the equalization of the image. Here is the implementation of this first part of the preprocessCatFace
function:
def preprocessCatFace(coords, image): leftEyeX, leftEyeY = coords[0], coords[1] rightEyeX, rightEyeY = coords[2], coords[3] mouthX = coords[4] if leftEyeX > rightEyeX and leftEyeY < rightEyeY and \ mouthX > rightEyeX: # The "right eye" is in the second quadrant of the face, # while the "left eye" is in the fourth quadrant (from the # viewer's perspective.) Swap the eyes' labels in order to # simplify the rotation logic. leftEyeX, rightEyeX = rightEyeX, leftEyeX leftEyeY, rightEyeY = rightEyeY, leftEyeY eyesCenter = (0.5 * (leftEyeX + rightEyeX), 0.5 * (leftEyeY + rightEyeY)) eyesDeltaX = rightEyeX - leftEyeX eyesDeltaY = rightEyeY - leftEyeY eyesAngleRadians = math.atan2(eyesDeltaY, eyesDeltaX) eyesAngleDegrees = eyesAngleRadians * 180.0 / cv2.cv.CV_PI # Straighten the image and fill in gray for blank borders. rotation = cv2.getRotationMatrix2D( eyesCenter, eyesAngleDegrees, 1.0) imageSize = image.shape[1::-1] straight = cv2.warpAffine(image, rotation, imageSize, borderValue=(128, 128, 128))
To straighten the image, we will call rotateCoords
to make the feature coordinates that match the straightened image. Here is the code for this function call:
# Straighten the coordinates of the features. newCoords = rotateCoords( coords, eyesCenter, eyesAngleRadians)
At this stage, the image and feature coordinates are transformed such that the cat's eyes are level and upright. Next, let's crop the image to eliminate most of the background and to standardize the eyes' position relative to the bounds. Arbitrarily, we will define the cropped face to be a square region, as wide as the distance between the outer base points of the cat's ears. This square is positioned such that half its area lies to the left of the midpoint between the cat's eyes and half lies to the right, 40 percent lies above, and 60 percent lies below. For an ideal frontal cat face, this crop excludes all background regions but includes the eyes, chin, and several fleshy regions: the nose, mouth, and part of the inside of the ears. We will equalize and return the cropped image. Accordingly, the implementation of preprocessCatFace
proceeds as follows:
# Make the face as wide as the space between the ear bases. # (The ear base positions are specified in the reference # coordinates.) w = abs(newCoords[16] - newCoords[6]) # Make the face square. h = w # Put the center point between the eyes at (0.5, 0.4) in # proportion to the entire face. minX = eyesCenter[0] - w/2 if minX < 0: w += minX minX = 0 minY = eyesCenter[1] - h*2/5 if minY < 0: h += minY minY = 0 # Crop the face. crop = straight[minY:minY+h, minX:minX+w] # Convert the crop to equalized grayscale. crop = equalizedGray(crop) # Return the crop. return crop
The pair of following images is an example of input and output for the processCatFace
function. First, let's look at the input:
The output is displayed as follows:
To generate the positive description file, we will iterate over all the images in the Microsoft Cat Dataset 2008. For each image, we will parse the cat feature coordinates from the corresponding .cat
file and will generate the straightened, cropped, and equalized image by passing the coordinates and original image to our processCatFace
function. We will append each processed image's path and measurements to the positive description file. Here is the implementation:
def describePositive(): output = open('positive_description.txt', 'w') dirs = ['CAT_DATASET_01/CAT_00', 'CAT_DATASET_01/CAT_01', 'CAT_DATASET_01/CAT_02', 'CAT_DATASET_02/CAT_03', 'CAT_DATASET_02/CAT_04', 'CAT_DATASET_02/CAT_05', 'CAT_DATASET_02/CAT_06'] for dir in dirs: for imagePath in glob.glob('%s/*.jpg' % dir): if imagePath.endswith(outputImageExtension): # This file is a crop, saved on a previous run. # Skip it. continue # Open the '.cat' annotation file associated with this # image. input = open('%s.cat' % imagePath, 'r') # Read the coordinates of the cat features from the # file. Discard the first number, which is the number # of features. coords = [int(i) for i in input.readline().split()[1:]] # Read the image. image = cv2.imread(imagePath) # Straighten and crop the cat face. crop = preprocessCatFace(coords, image) if crop is None: print >> sys.stderr, \ 'Failed to preprocess image at %s.' % \ imagePath continue # Save the crop. cropPath = '%s%s' % (imagePath, outputImageExtension) cv2.imwrite(cropPath, crop) # Append the cropped face and its bounds to the # positive description. h, w = crop.shape[:2] print >> output, cropPath, 1, 0, 0, w, h
Here, let's take note of the format of a positive description file. Each line contains a path to a training image followed by a series of numbers that indicate the count of positive objects in the image and the measurements (x, y, width, and height) of rectangles that contains those objects. In our case, there is always one cat face filling the entire cropped image, so we get lines such as the following, which is for a 64 x 64 image:
CAT_DATASET_02/CAT_06/00001493_005.jpg.out.jpg 1 0 0 64 64
Hypothetically, if the image had two 8 x 8 pixel cat faces in opposite corners, its line in the description file would look like this:
CAT_DATASET_02/CAT_06/00001493_005.jpg.out.jpg 2 0 0 8 8 56 56 8 8
The main function of describe.py
simply calls our describeNegative
and describePositive
functions, as follows:
def main(): describeNegative() describePositive() if __name__ == '__main__': main()
Run describe.py
and then feel free to have a look at the generated files, including negative_description.txt
, positive_description.txt
, and the cropped cat faces whose filenames follow the pattern CAT_DATASET_*/CAT_*/*.out.jpg
.
Next, we will use two command-line tools that come with OpenCV. We will refer to them as <opencv_createsamples>
and <opencv_traincascade>
respectively. They are responsible for converting the positive description to a binary format and generating the Haar cascade in an XML format. On Windows, these executables are named opencv_createsamples.exe
and opencv_traincascade.exe
. They are located in the following directories, one of which we should add to the system's Path
variable:
<opencv_unzip_destination>\build\x64\vc10\bin
(64-bit, requires Visual C++ 2010 Runtime Redistributable)<opencv_unzip_destination>\build\x64\vc11\bin
(64-bit, requires Visual C++ 2011 Runtime Redistributable)<opencv_unzip_destination>\build\x86\vc10\bin
(32-bit, requires Visual C++ 2010 Runtime Redistributable)<opencv_unzip_destination>\build\x86\vc11\bin
(32-bit, requires Visual C++ 2011 Runtime Redistributable)On Mac or Linux, the executables are named opencv_createsamples
and opencv_traincascade
. They are located in one of the following directories, which should already be in the system's PATH
variable:
/opt/local/bin
/opt/local/bin
or opt/local/sbin
/usr/local/bin
/usr/bin
or /usr/local/bin
Many flags can be used to provide arguments to <opencv_createsamples>
and <opencv_traincascade>
, as described in the official documentation at http://docs.opencv.org/doc/user_guide/ug_traincascade.html. We will use the following flags and values:
vec
: This consists of the path to a binary description of the positive training images. This file is generated by <opencv_createsamples>
.info
: The path to a text description of the positive training images. We generated this file using describe.py
.bg
: The path to a text description of the negative training images. We generated this file using describe.py
.num
: The number of positive training images in info
.numStages
: The number of stages in the cascade. As we discussed earlier that conceptualizing Haar cascades and LBPH, each stage is a test that is applied to a region in an image. If the region passes all tests, it is classified as a frontal cat face (or whatever class of object the positive training set represents). We will use 15
for our project.numPos
: The number of positive training images used in each stage. It should be significantly smaller than num
. (Otherwise, the trainer will fail, complaining that it has run out of new images to use in new stages.) We will use 90 percent of num
.numNeg
: The number of negative training images used in each stage. We will use 90 percent of the number of negative training images in bg
.minHitRate
: The minimum proportion of training images that each stage must classify correctly. A higher proportion implies a longer training time but a better fit between the model and the training data. (A better fit is normally a good thing, though it is possible to overfit such that the model does not make correct extrapolations beyond the training data.) We will use 0.999
.featureType
: The type of features used are either HAAR
(the default) or LBP
. As discussed in the preceding points, Haar cascades tend to be more reliable but are much slower to train and somewhat slower at runtime.mode
: The subset of Haar features used. (For LBP, this flag has no effect.) The valid options are BASIC (the default), CORE, and ALL. The CORE option makes the model slower to train and run, but the benefit to this is to make the model sensitive to little dots and thick lines. The ALL option goes further, making the model even slower to train and run but adds sensitivity to the diagonal patterns (whereas BASIC and CORE are only sensitive to horizontal and vertical patterns). The ALL option has nothing to do with detecting nonupright subjects; rather, it relates to detecting subjects that contain diagonal patterns. For example, a cat's whiskers and ears might qualify as diagonal patterns.Let's write a shell script to run <opencv_createsamples>
and <opencv_traincascade>
with the appropriate flags and to copy the resulting Haar cascade to the path where Interactive Cat Face Recognizer expects it. On Windows, let's call our train.bat
script and implement it as follows:
set vec=binary_description set info=positive_description.txt set bg=negative_description.txt REM Uncomment the next 4 variables for LBP training. REM set featureType=LBP REM set data=lbpcascade_frontalcatface\ REM set dst=..\cascades\lbpcascade_frontalcatface.xml REM set mode=BASIC REM Uncomment the next 4 variables for Haar training with basic REM features. set featureType=HAAR set data=haarcascade_frontalcatface\ set dst=..\cascades\haarcascade_frontalcatface.xml set mode=BASIC REM Uncomment the next 4 variables for Haar training with REM extended features. REM set featureType=HAAR REM set data=haarcascade_frontalcatface_extended\ REM set dst=..\cascades\haarcascade_frontalcatface_extended.xml REM set mode=ALL REM Set numPosTotal to be the line count of info. for /f %c in ('find /c /v "" ^< "%info%"') do set numPosTotal=%c REM Set numNegTotal to be the line count of bg. for /f %c in ('find /c /v "" ^< "%bg%"') do set numNegTotal=%c set /a numPosPerStage=%numPosTotal%*9/10 set /a numNegPerStage=%numNegTotal%*9/10 set numStages=15 set minHitRate=0.999 REM Ensure that the data directory exists and is empty. if not exist "%data%" (mkdir "%data%") else del /f /q "%data%\*.xml" opencv_createsamples -vec "%vec%" -info "%info%" -bg "%bg%" ^ -num "%numPosTotal%" opencv_traincascade -data "%data%" -vec "%vec%" -bg "%bg%" ^ -numPos "%numPosPerStage%" -numNeg "%numNegPerStage%" ^ -numStages "%numStages%" -minHitRate "%minHitRate%" ^ -featureType "%featureType%" -mode "%mode%" cp "%data%\cascade.xml" "%dst%"
On Mac or Linux, let's call our train.sh
script instead and implement it as follows:
#!/bin/sh vec=binary_description info=positive_description.txt bg=negative_description.txt # Uncomment the next 4 variables for LBP training. #featureType=LBP #data=lbpcascade_frontalcatface/ #dst=../cascades/lbpcascade_frontalcatface.xml #mode=BASIC # Uncomment the next 4 variables for Haar training with basic # features. featureType=HAAR data=haarcascade_frontalcatface/ dst=../cascades/haarcascade_frontalcatface.xml mode=BASIC # Uncomment the next 4 variables for Haar training with # extended features. #featureType=HAAR #data=haarcascade_frontalcatface_extended/ #dst=../cascades/haarcascade_frontalcatface_extended.xml #mode=ALL # Set numPosTotal to be the line count of info. numPosTotal=`wc -l < $info` # Set numNegTotal to be the line count of bg. numNegTotal=`wc -l < $bg` numPosPerStage=$(($numPosTotal*9/10)) numNegPerStage=$(($numNegTotal*9/10)) numStages=15 minHitRate=0.999 # Ensure that the data directory exists and is empty. if [ ! -d "$data" ]; then mkdir "$data" else rm "$data/*.xml" fi opencv_createsamples -vec "$vec" -info "$info" -bg "$bg" \ -num "$numPosTotal" opencv_traincascade -data "$data" -vec "$vec" -bg "$bg" \ -numPos "$numPosPerStage" -numNeg "$numNegPerStage" \ -numStages "$numStages" -minHitRate "$minHitRate" \ -featureType "$featureType" –mode "$mode" cp "$data/cascade.xml" "$dst"
The preceding versions of the training script are configured to use basic Haar features and will take a long, long time to run, perhaps more than a day. By commenting out the variables related to a basic Haar configuration and uncommenting the variables related to an LBP configuration, we can cut the training time down to several minutes. As a third alternative, variables for an extended Haar configuration (sensitive to diagonal patterns) are also present but are currently commented out.
When the training is done, feel free to have a look at the generated files, including the following:
cascades/haarcascade_frontalcatface.xml
and cascade_training/haarcascade_frontalcatface/*
cascades/haarcascade_frontalcatface_extended.xml
and cascade_training/haarcascade_frontalcatface_extended/*
cascades/lbpcascade_frontalcatface.xml
and cascade_training/lbpcascade_frontalcatface/*
Finally, let's run InteractiveCatFaceRecognizer.py
to test our cascade!
Remember that our detector is designed for frontal upright cat faces. The cat should be facing the camera and might need some incentive to hold that pose. For example, you could ask the cat to settle on a blanket or in your lap, and you could pat or comb the cat. Refer to the following screenshot of my colleague, Chancellor Josephine "Little Jo" Antoinette Puddingcat, GRL (Grand Rock of Lambda), sitting for a test.
If you do not have a cat (or even a person) who is willing to participate, then you can simply print a few images of a given cat (or person) from the Web. Use heavy matte paper and hold the print so that it faces the camera. Use prints of some images to train the recognizer and prints of other images to test it.
Our detector is pretty good at finding frontal cat faces. However, I encourage you to experiment further, make it better, and share your results! The current version sometimes mistakes the center of a frontal human face for a frontal cat face. Perhaps we should have used more databases of human faces as negative training images. Alternatively, if we had used faces of several mammal species as positive training images, could we have created a more general mammal face detector? Let me know what you discover!