Implementing the Interactive Recognizer app

Let's create a new folder, where we will store this chapter's project, including the following subfolders and files that are relevant to the Interactive Recognizer app:

cascades/haarcascade_frontalface_alt.xml: This is a detection model for a frontal, human face. It should be included with OpenCV at a path such as <opencv_unzip_destination>/data/haarcascades/haarcascade_frontalface_alt.xml, or for a MacPorts installation at /opt/local/share/OpenCV/haarcascades/haarcascade_frontalface_alt.xml. Create a copy of it or a link to it. (Alternatively, you can get it from this chapter's code bundle.)
cascades/lbpcascade_frontalface.xml: This is an alternative (faster but less reliable) detection model for a frontal, human face. It should be included with OpenCV at a path such as <opencv_unzip_destination>/data/lbpcascades/lbpcascade_frontalface.xml, or for a MacPorts installation at /opt/local/share/OpenCV/lbpcascades/lbpcascade_frontalface.xml. Create a copy of it or a link to it. (Alternatively, you can get it from this chapter's code bundle.)
cascades/haarcascade_frontalcatface.xml: This is a detection model for a frontal, feline face. We will build it later in this chapter. (Alternatively, you can get a prebuilt version from this chapter's code bundle.)
cascades/haarcascade_frontalcatface_extended.xml: This is an alternative detection model for a frontal, feline face. This version is sensitive to diagonal patterns, which could include whiskers and ears. We will build it later in this chapter. (Alternatively, you can get a prebuilt version from this chapter's code bundle.)
cascades/lbpcascade_frontalcatface.xml: This is another alternative (faster but less reliable) detection model for a frontal, feline face. We will build it later in this chapter. (Alternatively, you can get a prebuilt version from this chapter's code bundle.)
recognizers/lbph_human_faces.xml: This is a recognition model for the faces of certain human individuals. It is generated by InteractiveHumanFaceRecognizer.py.
recognizers/lbph_cat_faces.xml: This is a recognition model for the faces of certain feline individuals. It is generated by InteractiveCatFaceRecognizer.py.
ResizeUtils.py: This contains the utility functions to resize images. Copy or link to the previous chapter's version of ResizeUtils.py. We will add a function to resize the camera capture dimensions.
WxUtils.py: This contains the utility functions for wxPython GUI applications. Copy or link to the previous chapter's version of WxUtils.py.
BinasciiUtils.py: This contains the utility functions to convert human-readable identifiers to numbers and back.
InteractiveRecognizer.py: This is a class that encapsulates the Interactive Recognizer app and exposes certain variables for configuration. We will implement it in this section.
InteractiveHumanFaceRecognizer.py: This is a script to launch a version of Interactive Recognizer that is configured for frontal, human faces. We will implement it in this section.
InteractiveCatFaceRecognizer.py: This is a script to launch a version of Interactive Recognizer that is configured for frontal, feline faces. We will implement it in this section.

Let's start with an addition to our existing ResizeUtils module. We want to be able to specify the resolution at which a camera captures images. Camera input is represented by an OpenCV class called VideoCapture, with get and set methods that pertain to various camera parameters including resolution. (Incidentally, VideoCapture can also represent a video file.) There is no guarantee that a given capture resolution is supported by a given camera. We need to check the success or failure of any attempt to set the capture resolution. Accordingly, let's add the following utility function to ResizeUtils.py to attempt to set a capture resolution and to return the actual capture resolution:

def cvResizeCapture(capture, preferredSize):
    # Try to set the requested dimensions.
    w, h = preferredSize
    successW = capture.set(cv2.cv.CV_CAP_PROP_FRAME_WIDTH, w)
    successH = capture.set(cv2.cv.CV_CAP_PROP_FRAME_HEIGHT, h)
    if successW and successH:
        # The requested dimensions were successfully set.
        # Return the requested dimensions.
        return preferredSize
    # The requested dimensions might not have been set.
    # Return the actual dimensions.
    w = capture.get(cv2.cv.CV_CAP_PROP_FRAME_WIDTH)
    h = capture.get(cv2.cv.CV_CAP_PROP_FRAME_HEIGHT)
    return (w, h)

Now, let's consider the requirements for our new BinasciiUtils module. OpenCV's recognizers use 32-bit integers as identifiers. For a GUI, asking the user to give a face a number instead of a name is not very friendly. We could keep a dictionary that maps numbers to names, and we could save this dictionary to the disk alongside the recognition model, but here is my lazier solution. Four or fewer ASCII characters can be cast to a 32-bit integer (and vice versa). We will let the user name each face by entering up to four characters and, behind the scenes, we will convert the names to and from the 32-bit integers that the model stores. Let's create BinasciiUtils.py and put the following imports and conversion functions in it:

import binascii

def fourCharsToInt(s):
  return int(binascii.hexlify(s), 16)

def intToFourChars(i):
  return binascii.unhexlify(format(i, 'x'))

Now, let's proceed to write InteractiveRecognizer.py. It should start with the following import statements:

import numpy
import cv2
import os
import sys
import threading
import wx

import BinasciiUtils
import ResizeUtils
import WxUtils

Our application's class, InteractiveRecognizer, accepts several arguments that allow us to create variants of the app with different titles, highlight colors, recognition models, detection models, and tweaks to the detection behavior. Let's look at the initializer's declaration:

class InteractiveRecognizer(wx.Frame):

  def __init__(self, recognizerPath, cascadePath,
    scaleFactor=1.3, minNeighbors=4,
    minSizeProportional=(0.25, 0.25),
    flags=cv2.cv.CV_HAAR_SCALE_IMAGE,
    rectColor=(0, 255, 0),
    cameraDeviceID=0, imageSize=(1280, 720),
    title='Interactive Recognizer'):

The initializer's arguments are defined as follows:

recognizerPath: This file contains the recognition model. It does not need to exist when the app starts. Rather, the recognition model (if any) is saved here when the app exits.
cascadePath: This file contains the detection model. It does need to exist when the app starts.
scaleFactor: Remember that, the detector searches for faces at several different scales. This argument specifies the ratio of each scale to the next smaller scale. A bigger ratio implies a faster search but fewer detections.
minNeighbors: If the detector encounters two overlapping regions that both might pass detection as faces, they are called neighbors. The minNeighbors argument specifies the minimum number of neighbors that a face must have in order to pass detection. Where minNeighbors>0, the rationale is that a true face could be cropped at several alternative places and still look like a face. A greater number of required neighbors implies fewer detections and a lower proportion of false positives.
minSizeProportional: This is used to define a face's minimum width and height. They are expressed as a proportion of the camera's vertical resolution or horizontal resolution, whichever is less. For example, if the camera resolution is 640 x 480 and minSizeProportional=(0.25, 0.25), a face must measure at least 120 x 120 (in pixels) in order to pass detection. A bigger minimum size implies a faster search but fewer detections. The default value, (0.25, 0.25), is appropriate for a face that is close to a webcam.
flags: This is one of the techniques used to narrow the detector's search. When the detector is using a cascade file trained in a recent version of OpenCV, these flags
do nothing. However, they might do some good when using an old cascade file. Not all combinations are valid. The valid standalone flags and valid combinations include the following:
- cv2.cv.CV_HAAR_SCALE_IMAGE: This is used to apply certain optimizations when changing the scale of the search. This flag must not be combined with others.
- cv2.cv.CV_HAAR_DO_CANNY_PRUNING: This is used to eagerly reject regions that contain too many or too few edges to pass detection. This flag should not be combined with cv2.cv.CV_HAAR_FIND_BIGGEST_OBJECT.
- cv2.cv.CV_HAAR_FIND_BIGGEST_OBJECT: This is used to detect, at most, one face (the biggest).
- cv2.cv.CV_HAAR_FIND_BIGGEST_OBJECT | cv2.cv.HAAR_DO_ROUGH_SEARCH: This detects, at most, one face (the biggest) and skips some steps that would refine (shrink) the region of this face. This flag requires minNeighbors>0.
rectColor: This is the color of the rectangle that outlines a detected face. Like most color tuples in OpenCV, it is specified in the BGR order (not RGB).
cameraDeviceID: This argument shows the device ID of the camera that should be used for input. Typically, webcams are numbered starting from 0, and any connected external webcams come before any internal webcams. Some camera drivers reserve fixed device IDs. For example, OpenNI reserves 900 for Kinect and 910 for Asus Xtion.
imageSize: This shows the preferred resolution for captured images. If the camera does not support this resolution, another resolution is used.
title: This shows the app's title, as seen in the window's title bar.

We will also provide a public Boolean variable for configuration to check whether or not the camera feed is mirrored. By default, it is mirrored because users find a mirrored image of themselves to be more intuitive:

    self.mirrored = True

Another Boolean variable tracks whether the app should still be running or whether it is closing. This information is relevant to cleaning up a background thread:

    self._running = True

Using an OpenCV class called cv2.VideoCapture, we can open a camera feed and get its resolution, as follows:

    self._capture = cv2.VideoCapture(cameraDeviceID)
    size = ResizeUtils.cvResizeCapture(
      self._capture, imageSize)
    self._imageWidth, self._imageHeight = size

Next, we will set up variables related to detection and recognition. Many of these variables just store initialization arguments for later use. Also, we will keep a reference to the currently detected face, which is initially None. We will initialize an LBPH recognizer and load any recognition model that we might have saved on a previous run on the app. Likewise, we will initialize a detector by loading a Haar cascade or LBP cascade from a file, as shown in the following code:

    self._currDetectedObject = None

    self._recognizerPath = recognizerPath
    self._recognizer = cv2.createLBPHFaceRecognizer()
    if os.path.isfile(recognizerPath):
      self._recognizer.load(recognizerPath)
      self._recognizerTrained = True
    else:
      self._recognizerTrained = False

    self._detector = cv2.CascadeClassifier(cascadePath)
    self._scaleFactor = scaleFactor
    self._minNeighbors = minNeighbors
    minImageSize = min(self._imageWidth, self._imageHeight)
    self._minSize = (int(minImageSize * minSizeProportional[0]),
      int(minImageSize * minSizeProportional[1]))
    self._flags = flags
    self._rectColor = rectColor

Having set up the variables that are relevant to computer vision, we will proceed to the GUI implementation, which is mostly boilerplate code. First, in the following code snippet, we will set up the window with a certain style, size, title, and background color and we will bind a handler for its close event:

    style = wx.CLOSE_BOX | wx.MINIMIZE_BOX | wx.CAPTION | \
      wx.SYSTEM_MENU | wx.CLIP_CHILDREN
    wx.Frame.__init__(self, None, title=title,
      style=style, size=size)
    self.SetBackgroundColour(wx.Colour(232, 232, 232))

    self.Bind(wx.EVT_CLOSE, self._onCloseWindow)

Next, we will set a callback for the Esc key. Since a key is not a GUI widget, there is no Bind method directly associated with a key, and we need to set up the callback a bit differently than we have previously seen with wxWidgets. We will bind a new menu event and callback to the InteractiveRecognizer instance and we will map a keyboard shortcut to the menu event using a class called wx.AcceleratorTable, as shown in the following code. Note, however, that our app actually has no menu, nor is an actual menu item required for the keyboard shortcut to work:

    quitCommandID = wx.NewId()
    self.Bind(wx.EVT_MENU, self._onQuitCommand,
      id=quitCommandID)
    acceleratorTable = wx.AcceleratorTable([
      (wx.ACCEL_NORMAL, wx.WXK_ESCAPE, quitCommandID)
    ])
    self.SetAcceleratorTable(acceleratorTable)

The following code initializes the GUI widgets (including an image holder, text field, buttons, and label) and sets their event callbacks:

    self._staticBitmap = wx.StaticBitmap(self, size=size)
    self._showImage(None)

    self._referenceTextCtrl = wx.TextCtrl(
      self, style=wx.TE_PROCESS_ENTER)
    self._referenceTextCtrl.SetMaxLength(4)
    self._referenceTextCtrl.Bind(
      wx.EVT_KEY_UP, self._onReferenceTextCtrlKeyUp)

    self._predictionStaticText = wx.StaticText(self)
    # Insert an endline for consistent spacing.
    self._predictionStaticText.SetLabel('\n')

    self._updateModelButton = wx.Button(
      self, label='Add to Model')
    self._updateModelButton.Bind(
      wx.EVT_BUTTON, self._updateModel)
    self._updateModelButton.Disable()

    self._clearModelButton = wx.Button(
      self, label='Clear Model')
    self._clearModelButton.Bind(
      wx.EVT_BUTTON, self._clearModel)
    if not self._recognizerTrained:
      self._clearModelButton.Disable()

    border = 12

    controlsSizer = wx.BoxSizer(wx.HORIZONTAL)
    controlsSizer.Add(self._referenceTextCtrl, 0,
      wx.ALIGN_CENTER_VERTICAL | wx.RIGHT,
      border)
    controlsSizer.Add(
      self._updateModelButton, 0,
      wx.ALIGN_CENTER_VERTICAL | wx.RIGHT, border)
    controlsSizer.Add(self._predictionStaticText, 0,
      wx.ALIGN_CENTER_VERTICAL)
    controlsSizer.Add((0, 0), 1) # Spacer
    controlsSizer.Add(self._clearModelButton, 0,
      wx.ALIGN_CENTER_VERTICAL)

    rootSizer = wx.BoxSizer(wx.VERTICAL)
    rootSizer.Add(self._staticBitmap)
    rootSizer.Add(controlsSizer, 0, wx.EXPAND | wx.ALL, border)
    self.SetSizerAndFit(rootSizer)

Finally, the initializer starts a background thread that performs image capture and image processing, including detection and recognition. It is important to perform the intensive computer vision work on a background thread so it does not stall the handling of GUI events. Here is the code that starts the thread:

    self._captureThread = threading.Thread(
      target=self._runCaptureLoop)
    self._captureThread.start()

With a variety of input events and background work, InteractiveRecognizer has many methods that run in an indeterminate order. We will look at input event handlers first before proceeding to the image pipeline (capture, processing, and display), which partly runs on the background thread.

When the window is closed, we will ensure that the background thread stops. Then, if the recognition model is trained, we will save it to a file. Here is the implementation of the relevant callback:

  def _onCloseWindow(self, event):
    self._running = False
    self._captureThread.join()
    if self._recognizerTrained:
      modelDir = os.path.dirname(self._recognizerPath)
      if not os.path.isdir(modelDir):
        os.makedirs(modelDir)
      self._recognizer.save(self._recognizerPath)
    self.Destroy()

Besides closing the window when its standard X button is clicked, we will also close it in the _onQuitCommand callback, which we had linked to the Esc button. Here is this callback's implementation:

  def _onQuitCommand(self, event):
    self.Close()

When the user adds or deletes text in the text field, our _onReferenceTextCtrlKeyUp callback, which is as follows, calls a helper method to check whether the Add to Model button should be enabled or disabled:

  def _onReferenceTextCtrlKeyUp(self, event):
    self._enableOrDisableUpdateModelButton()

When the Add to Model button is clicked, its callback provides new training data to the recognition model. If the LBPH model has no prior training data, we must use the recognizer's train method; otherwise, we must use its update method. Both methods accept two arguments: a list of images (the faces) and a NumPy array of integers (the faces' identifiers). We will train or update the model with just one image at a time so that the user can interactively test the effect of each incremental change to the model. The image is the most recently detected face and the identifier is converted from the text in the text field using our BinasciiUtils.fourCharsToInt function. Here is the code for the implementation of the Add to Model button's callback:

  def _updateModel(self, event):
    labelAsStr = self._referenceTextCtrl.GetValue()
    labelAsInt = BinasciiUtils.fourCharsToInt(labelAsStr)
    src = [self._currDetectedObject]
    labels = numpy.array([labelAsInt])
    if self._recognizerTrained:
      self._recognizer.update(src, labels)
    else:
      self._recognizer.train(src, labels)
      self._recognizerTrained = True
      self._clearModelButton.Enable()

When the Clear Model button is clicked, its callback deletes the recognition model (including any version that has been saved to the disk) and creates a new one. Also, we will record that the model is untrained and will disable the Clear Model button until the model is retrained, as shown in the following code:

  def _clearModel(self, event=None):
    self._recognizerTrained = False
    self._clearModelButton.Disable()
    if os.path.isfile(self._recognizerPath):
      os.remove(self._recognizerPath)
    self._recognizer = cv2.createLBPHFaceRecognizer()

Our background thread runs a loop. On each iteration, we will capture an image using the VideoCapture object's read method. Along with the image, the read method returns a success flag, which we do not need because instead, we will just check whether the image is None. If the image is not None, we will pass it to a helper method called _detectAndRecognize and then we can mirror the image for display. We will finish the iteration by passing the image (potentially None) to a _showImage method, which runs on the main thread (because we invoke the method using wx.CallAfter). Here is the implementation of the loop:

  def _runCaptureLoop(self):
    while self._running:
      success, image = self._capture.read()
      if image is not None:
        self._detectAndRecognize(image)
      if (self.mirrored):
        image[:] = numpy.fliplr(image)
    wx.CallAfter(self._showImage, image)

Recall that the loop ends after our _onCloseWindow callback sets _running to False.

The helper method, _detectAndRecognize, is also running on the background thread. It begins the process by creating an equalized grayscale version of the image. An equalized image has a uniform histogram; every value of gray is equally common. It is a kind of contrast adjustment that makes a subject's appearance more predictable despite different lighting conditions in different images; thus, it aids detection or recognition. We will pass the equalized image to the classifier's detectMultiScale method with the scaleFactor, minNeighbors, minSize, and flags arguments that were specified during the initialization of InteractiveRecognizer. As the return value from detectMultiScale, we get a list of rectangle measurements that describe the bounds of the detected faces. For display, we will draw green outlines around these faces. If at least one face is detected, we will store an equalized, grayscale version of the first face in a member variable, _currDetectedObject. Here is the implementation of this first portion of the _detectAndRecognize method:

  def _detectAndRecognize(self, image):
    grayImage = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    equalizedGrayImage = cv2.equalizeHist(grayImage)
    rects = self._detector.detectMultiScale(
      equalizedGrayImage, scaleFactor=self._scaleFactor,
      minNeighbors=self._minNeighbors,
      minSize=self._minSize, flags=self._flags)
    for x, y, w, h in rects:
      cv2.rectangle(image, (x, y), (x+w, y+h),
        self._rectColor, 1)
    if len(rects) > 0:
      x, y, w, h = rects[0]
      self._currDetectedObject = cv2.equalizeHist(
        grayImage[y:y+h, x:x+w])

If a face is currently detected and the recognition model is trained for at least one individual, we can proceed to predict the identity of the face. We will pass the equalized face to the predict method of the recognizer and get two return values: an integer identifier and a measure of distance (non-confidence). Using our BinasciiUtils.intToFourChars function, we will convert the integer to a string (of at most four characters), which will be one of the face names that the user previously entered. We will show the name and distance so that the user can read the recognition result. If an error occurs (for example, if an invalid model was loaded from a file), we will delete and recreate the model. If the model is not yet trained, we will show the instructions about training the model. Here is the implementation of the middle portion of the _detectAndRecognize method:

      if self._recognizerTrained:
        try:
          labelAsInt, distance = self._recognizer.predict(
            self._currDetectedObject)
          labelAsStr = BinasciiUtils.intToFourChars(
            labelAsInt)
          self._showMessage(
            'This looks most like %s.\n'
            'The distance is %.0f.' % \
            (labelAsStr, distance))
        except cv2.error:
           print >> sys.stderr, \
             'Recreating model due to error.'
           self._clearModel()
      else:
         self._showInstructions()

If no face was detected, we will set _currDetectedObject to None and show either the instructions (if the model is not yet trained) or no descriptive text (otherwise). Under all conditions, we will end the _detectAndRecognize method by ensuring that the Add to Model button is enabled or disabled, as appropriate. Here is the final portion of the method's implementation:

    else:
      self._currDetectedObject = None
      if self._recognizerTrained:
        self._clearMessage()
      else:
        self._showInstructions()
 
    self._enableOrDisableUpdateModelButton()

The Add to Model button should be enabled only when a face is detected and the text field is non-empty. We can implement this logic in the following manner:

  def _enableOrDisableUpdateModelButton(self):
    labelAsStr = self._referenceTextCtrl.GetValue()
    if len(labelAsStr) < 1 or \
      self._currDetectedObject is None:
      self._updateModelButton.Disable()
    else:
      self._updateModelButton.Enable()

Like in Luxocator (the previous chapter's project), we will show a black image if the image is None; otherwise, we will convert the image from the OpenCV format and show it, as follows:

  def _showImage(self, image):
    if image is None:
      # Provide a black bitmap.
      bitmap = wx.EmptyBitmap(self._imageWidth,
        self._imageHeight)
    else:
      # Convert the image to bitmap format.
      bitmap = WxUtils.wxBitmapFromCvImage(image)
    # Show the bitmap.
    self._staticBitmap.SetBitmap(bitmap)

Since we will set the label's text under several different conditions, we will use the following helper functions to reduce the repetition of code:

  def _showInstructions(self):
    self._showMessage(
      'When an object is highlighted, type its name\n'
      '(max 4 chars) and click "Add to Model".')

  def _clearMessage(self):
    # Insert an endline for consistent spacing.
    self._showMessage('\n')

  def _showMessage(self, message):
    wx.CallAfter(self._predictionStaticText.SetLabel, message)

Note the use of the wx.CallAfter function to ensure that the label is updated on the main thread.

That is all the functionality of Interactive Recognizer. Now, we just need to write the main functions for the two variants of the app, starting with Interactive Human Face Recognizer. Just as we passed arguments to the initializer of InteractiveRecognizer, we will provide the app's title and PyInstaller-compatible paths to the relevant detection model and recognition model. We will run the app. Here is the implementation, which we can put in InteractiveHumanFaceRecognizer.py:

import wx

from InteractiveRecognizer import InteractiveRecognizer
import PyInstallerUtils


def main():
  app = wx.App()
  recognizerPath = PyInstallerUtils.resourcePath(
    'recognizers/lbph_human_faces.xml')
  cascadePath = PyInstallerUtils.resourcePath(
    # Uncomment the next argument for LBP.
    #'cascades/lbpcascade_frontalface.xml')
    # Uncomment the next argument for Haar.
    'cascades/haarcascade_frontalface_alt.xml')
  interactiveDetector = InteractiveRecognizer(
    recognizerPath, cascadePath,
    title='Interactive Human Face Recognizer')
  interactiveDetector.Show()
  app.MainLoop()

if __name__ == '__main__':
  main()

Remember that cascades/haarcascade_frontalface_alt.xml or cascades/lpbcascade_frontalface.xml needs to be obtained from OpenCV's samples or from this chapter's code bundle. Feel free to test Interactive Human Face Recognizer now!

Our second variant of the app, Interactive Cat Face Recognizer, uses very similar code. We will change the app's title and the paths of the detection and recognition models. Also, we will raise the minNeighbors value to 8 to make the detector a little more conservative. (Our cat face detection model turns out to be more prone to false positives than our human face detection model.) Here is the implementation, which we can put in InteractiveCatFaceRecognizer.py:

import wx

from InteractiveRecognizer import InteractiveRecognizer
import PyInstallerUtils


def main():
  app = wx.App()
  recognizerPath = PyInstallerUtils.resourcePath(
    'recognizers/lbph_cat_faces.xml')
  cascadePath = PyInstallerUtils.resourcePath(
    # Uncomment the next argument for LBP.
    #'cascades/lbpcascade_frontalcatface.xml')
    # Uncomment the next argument for Haar with basic
    # features.
    'cascades/haarcascade_frontalcatface.xml')
    # Uncomment the next argument for Haar with extended
    # features.
    #'cascades/haarcascade_frontalcatface_extended.xml')
  interactiveDetector = InteractiveRecognizer(
    recognizerPath, cascadePath,
    minNeighbors=8,
    title='Interactive Cat Face Recognizer')
  interactiveDetector.Show()
  app.MainLoop()

if __name__ == '__main__':
  main()

At this stage, Interactive Cat Face Recognizer will not run properly because cascades/haarcascade_frontalcatface.xml, cascades/haarcascade_frontalcatface_extended.xml, or cascades/lpbcascade_frontalcatface.xml does not exist (unless you copied the prebuilt version from this chapter's code bundle). OpenCV 2.x does not come with any cat detection model, but we will soon create our own! (OpenCV 3.0 and later versions will contain this book's cat detection models!).