Implementing the Interactive Recognizer app

Let's create a new folder, where we will store this chapter's project, including the following subfolders and files that are relevant to the Interactive Recognizer app:

Let's start with an addition to our existing ResizeUtils module. We want to be able to specify the resolution at which a camera captures images. Camera input is represented by an OpenCV class called VideoCapture, with get and set methods that pertain to various camera parameters including resolution. (Incidentally, VideoCapture can also represent a video file.) There is no guarantee that a given capture resolution is supported by a given camera. We need to check the success or failure of any attempt to set the capture resolution. Accordingly, let's add the following utility function to ResizeUtils.py to attempt to set a capture resolution and to return the actual capture resolution:

def cvResizeCapture(capture, preferredSize):
    # Try to set the requested dimensions.
    w, h = preferredSize
    successW = capture.set(cv2.cv.CV_CAP_PROP_FRAME_WIDTH, w)
    successH = capture.set(cv2.cv.CV_CAP_PROP_FRAME_HEIGHT, h)
    if successW and successH:
        # The requested dimensions were successfully set.
        # Return the requested dimensions.
        return preferredSize
    # The requested dimensions might not have been set.
    # Return the actual dimensions.
    w = capture.get(cv2.cv.CV_CAP_PROP_FRAME_WIDTH)
    h = capture.get(cv2.cv.CV_CAP_PROP_FRAME_HEIGHT)
    return (w, h)

Now, let's consider the requirements for our new BinasciiUtils module. OpenCV's recognizers use 32-bit integers as identifiers. For a GUI, asking the user to give a face a number instead of a name is not very friendly. We could keep a dictionary that maps numbers to names, and we could save this dictionary to the disk alongside the recognition model, but here is my lazier solution. Four or fewer ASCII characters can be cast to a 32-bit integer (and vice versa). We will let the user name each face by entering up to four characters and, behind the scenes, we will convert the names to and from the 32-bit integers that the model stores. Let's create BinasciiUtils.py and put the following imports and conversion functions in it:

import binascii

def fourCharsToInt(s):
  return int(binascii.hexlify(s), 16)

def intToFourChars(i):
  return binascii.unhexlify(format(i, 'x'))

Now, let's proceed to write InteractiveRecognizer.py. It should start with the following import statements:

import numpy
import cv2
import os
import sys
import threading
import wx

import BinasciiUtils
import ResizeUtils
import WxUtils

Our application's class, InteractiveRecognizer, accepts several arguments that allow us to create variants of the app with different titles, highlight colors, recognition models, detection models, and tweaks to the detection behavior. Let's look at the initializer's declaration:

class InteractiveRecognizer(wx.Frame):

  def __init__(self, recognizerPath, cascadePath,
    scaleFactor=1.3, minNeighbors=4,
    minSizeProportional=(0.25, 0.25),
    flags=cv2.cv.CV_HAAR_SCALE_IMAGE,
    rectColor=(0, 255, 0),
    cameraDeviceID=0, imageSize=(1280, 720),
    title='Interactive Recognizer'):

The initializer's arguments are defined as follows:

We will also provide a public Boolean variable for configuration to check whether or not the camera feed is mirrored. By default, it is mirrored because users find a mirrored image of themselves to be more intuitive:

    self.mirrored = True

Another Boolean variable tracks whether the app should still be running or whether it is closing. This information is relevant to cleaning up a background thread:

    self._running = True

Using an OpenCV class called cv2.VideoCapture, we can open a camera feed and get its resolution, as follows:

    self._capture = cv2.VideoCapture(cameraDeviceID)
    size = ResizeUtils.cvResizeCapture(
      self._capture, imageSize)
    self._imageWidth, self._imageHeight = size

Next, we will set up variables related to detection and recognition. Many of these variables just store initialization arguments for later use. Also, we will keep a reference to the currently detected face, which is initially None. We will initialize an LBPH recognizer and load any recognition model that we might have saved on a previous run on the app. Likewise, we will initialize a detector by loading a Haar cascade or LBP cascade from a file, as shown in the following code:

    self._currDetectedObject = None

    self._recognizerPath = recognizerPath
    self._recognizer = cv2.createLBPHFaceRecognizer()
    if os.path.isfile(recognizerPath):
      self._recognizer.load(recognizerPath)
      self._recognizerTrained = True
    else:
      self._recognizerTrained = False

    self._detector = cv2.CascadeClassifier(cascadePath)
    self._scaleFactor = scaleFactor
    self._minNeighbors = minNeighbors
    minImageSize = min(self._imageWidth, self._imageHeight)
    self._minSize = (int(minImageSize * minSizeProportional[0]),
      int(minImageSize * minSizeProportional[1]))
    self._flags = flags
    self._rectColor = rectColor

Having set up the variables that are relevant to computer vision, we will proceed to the GUI implementation, which is mostly boilerplate code. First, in the following code snippet, we will set up the window with a certain style, size, title, and background color and we will bind a handler for its close event:

    style = wx.CLOSE_BOX | wx.MINIMIZE_BOX | wx.CAPTION | \
      wx.SYSTEM_MENU | wx.CLIP_CHILDREN
    wx.Frame.__init__(self, None, title=title,
      style=style, size=size)
    self.SetBackgroundColour(wx.Colour(232, 232, 232))

    self.Bind(wx.EVT_CLOSE, self._onCloseWindow)

Next, we will set a callback for the Esc key. Since a key is not a GUI widget, there is no Bind method directly associated with a key, and we need to set up the callback a bit differently than we have previously seen with wxWidgets. We will bind a new menu event and callback to the InteractiveRecognizer instance and we will map a keyboard shortcut to the menu event using a class called wx.AcceleratorTable, as shown in the following code. Note, however, that our app actually has no menu, nor is an actual menu item required for the keyboard shortcut to work:

    quitCommandID = wx.NewId()
    self.Bind(wx.EVT_MENU, self._onQuitCommand,
      id=quitCommandID)
    acceleratorTable = wx.AcceleratorTable([
      (wx.ACCEL_NORMAL, wx.WXK_ESCAPE, quitCommandID)
    ])
    self.SetAcceleratorTable(acceleratorTable)

The following code initializes the GUI widgets (including an image holder, text field, buttons, and label) and sets their event callbacks:

    self._staticBitmap = wx.StaticBitmap(self, size=size)
    self._showImage(None)

    self._referenceTextCtrl = wx.TextCtrl(
      self, style=wx.TE_PROCESS_ENTER)
    self._referenceTextCtrl.SetMaxLength(4)
    self._referenceTextCtrl.Bind(
      wx.EVT_KEY_UP, self._onReferenceTextCtrlKeyUp)

    self._predictionStaticText = wx.StaticText(self)
    # Insert an endline for consistent spacing.
    self._predictionStaticText.SetLabel('\n')

    self._updateModelButton = wx.Button(
      self, label='Add to Model')
    self._updateModelButton.Bind(
      wx.EVT_BUTTON, self._updateModel)
    self._updateModelButton.Disable()

    self._clearModelButton = wx.Button(
      self, label='Clear Model')
    self._clearModelButton.Bind(
      wx.EVT_BUTTON, self._clearModel)
    if not self._recognizerTrained:
      self._clearModelButton.Disable()

Similar to Luxocator (the previous chapter's project), Interactive Recognizer lays out the image in the top part of the window and a row of controls in the bottom part of the window. The following code performs the layout:

    border = 12

    controlsSizer = wx.BoxSizer(wx.HORIZONTAL)
    controlsSizer.Add(self._referenceTextCtrl, 0,
      wx.ALIGN_CENTER_VERTICAL | wx.RIGHT,
      border)
    controlsSizer.Add(
      self._updateModelButton, 0,
      wx.ALIGN_CENTER_VERTICAL | wx.RIGHT, border)
    controlsSizer.Add(self._predictionStaticText, 0,
      wx.ALIGN_CENTER_VERTICAL)
    controlsSizer.Add((0, 0), 1) # Spacer
    controlsSizer.Add(self._clearModelButton, 0,
      wx.ALIGN_CENTER_VERTICAL)

    rootSizer = wx.BoxSizer(wx.VERTICAL)
    rootSizer.Add(self._staticBitmap)
    rootSizer.Add(controlsSizer, 0, wx.EXPAND | wx.ALL, border)
    self.SetSizerAndFit(rootSizer)

Finally, the initializer starts a background thread that performs image capture and image processing, including detection and recognition. It is important to perform the intensive computer vision work on a background thread so it does not stall the handling of GUI events. Here is the code that starts the thread:

    self._captureThread = threading.Thread(
      target=self._runCaptureLoop)
    self._captureThread.start()

With a variety of input events and background work, InteractiveRecognizer has many methods that run in an indeterminate order. We will look at input event handlers first before proceeding to the image pipeline (capture, processing, and display), which partly runs on the background thread.

When the window is closed, we will ensure that the background thread stops. Then, if the recognition model is trained, we will save it to a file. Here is the implementation of the relevant callback:

  def _onCloseWindow(self, event):
    self._running = False
    self._captureThread.join()
    if self._recognizerTrained:
      modelDir = os.path.dirname(self._recognizerPath)
      if not os.path.isdir(modelDir):
        os.makedirs(modelDir)
      self._recognizer.save(self._recognizerPath)
    self.Destroy()

Besides closing the window when its standard X button is clicked, we will also close it in the _onQuitCommand callback, which we had linked to the Esc button. Here is this callback's implementation:

  def _onQuitCommand(self, event):
    self.Close()

When the user adds or deletes text in the text field, our _onReferenceTextCtrlKeyUp callback, which is as follows, calls a helper method to check whether the Add to Model button should be enabled or disabled:

  def _onReferenceTextCtrlKeyUp(self, event):
    self._enableOrDisableUpdateModelButton()

When the Add to Model button is clicked, its callback provides new training data to the recognition model. If the LBPH model has no prior training data, we must use the recognizer's train method; otherwise, we must use its update method. Both methods accept two arguments: a list of images (the faces) and a NumPy array of integers (the faces' identifiers). We will train or update the model with just one image at a time so that the user can interactively test the effect of each incremental change to the model. The image is the most recently detected face and the identifier is converted from the text in the text field using our BinasciiUtils.fourCharsToInt function. Here is the code for the implementation of the Add to Model button's callback:

  def _updateModel(self, event):
    labelAsStr = self._referenceTextCtrl.GetValue()
    labelAsInt = BinasciiUtils.fourCharsToInt(labelAsStr)
    src = [self._currDetectedObject]
    labels = numpy.array([labelAsInt])
    if self._recognizerTrained:
      self._recognizer.update(src, labels)
    else:
      self._recognizer.train(src, labels)
      self._recognizerTrained = True
      self._clearModelButton.Enable()

When the Clear Model button is clicked, its callback deletes the recognition model (including any version that has been saved to the disk) and creates a new one. Also, we will record that the model is untrained and will disable the Clear Model button until the model is retrained, as shown in the following code:

  def _clearModel(self, event=None):
    self._recognizerTrained = False
    self._clearModelButton.Disable()
    if os.path.isfile(self._recognizerPath):
      os.remove(self._recognizerPath)
    self._recognizer = cv2.createLBPHFaceRecognizer()

Our background thread runs a loop. On each iteration, we will capture an image using the VideoCapture object's read method. Along with the image, the read method returns a success flag, which we do not need because instead, we will just check whether the image is None. If the image is not None, we will pass it to a helper method called _detectAndRecognize and then we can mirror the image for display. We will finish the iteration by passing the image (potentially None) to a _showImage method, which runs on the main thread (because we invoke the method using wx.CallAfter). Here is the implementation of the loop:

  def _runCaptureLoop(self):
    while self._running:
      success, image = self._capture.read()
      if image is not None:
        self._detectAndRecognize(image)
      if (self.mirrored):
        image[:] = numpy.fliplr(image)
    wx.CallAfter(self._showImage, image)

Recall that the loop ends after our _onCloseWindow callback sets _running to False.

The helper method, _detectAndRecognize, is also running on the background thread. It begins the process by creating an equalized grayscale version of the image. An equalized image has a uniform histogram; every value of gray is equally common. It is a kind of contrast adjustment that makes a subject's appearance more predictable despite different lighting conditions in different images; thus, it aids detection or recognition. We will pass the equalized image to the classifier's detectMultiScale method with the scaleFactor, minNeighbors, minSize, and flags arguments that were specified during the initialization of InteractiveRecognizer. As the return value from detectMultiScale, we get a list of rectangle measurements that describe the bounds of the detected faces. For display, we will draw green outlines around these faces. If at least one face is detected, we will store an equalized, grayscale version of the first face in a member variable, _currDetectedObject. Here is the implementation of this first portion of the _detectAndRecognize method:

  def _detectAndRecognize(self, image):
    grayImage = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    equalizedGrayImage = cv2.equalizeHist(grayImage)
    rects = self._detector.detectMultiScale(
      equalizedGrayImage, scaleFactor=self._scaleFactor,
      minNeighbors=self._minNeighbors,
      minSize=self._minSize, flags=self._flags)
    for x, y, w, h in rects:
      cv2.rectangle(image, (x, y), (x+w, y+h),
        self._rectColor, 1)
    if len(rects) > 0:
      x, y, w, h = rects[0]
      self._currDetectedObject = cv2.equalizeHist(
        grayImage[y:y+h, x:x+w])

If a face is currently detected and the recognition model is trained for at least one individual, we can proceed to predict the identity of the face. We will pass the equalized face to the predict method of the recognizer and get two return values: an integer identifier and a measure of distance (non-confidence). Using our BinasciiUtils.intToFourChars function, we will convert the integer to a string (of at most four characters), which will be one of the face names that the user previously entered. We will show the name and distance so that the user can read the recognition result. If an error occurs (for example, if an invalid model was loaded from a file), we will delete and recreate the model. If the model is not yet trained, we will show the instructions about training the model. Here is the implementation of the middle portion of the _detectAndRecognize method:

      if self._recognizerTrained:
        try:
          labelAsInt, distance = self._recognizer.predict(
            self._currDetectedObject)
          labelAsStr = BinasciiUtils.intToFourChars(
            labelAsInt)
          self._showMessage(
            'This looks most like %s.\n'
            'The distance is %.0f.' % \
            (labelAsStr, distance))
        except cv2.error:
           print >> sys.stderr, \
             'Recreating model due to error.'
           self._clearModel()
      else:
         self._showInstructions()

If no face was detected, we will set _currDetectedObject to None and show either the instructions (if the model is not yet trained) or no descriptive text (otherwise). Under all conditions, we will end the _detectAndRecognize method by ensuring that the Add to Model button is enabled or disabled, as appropriate. Here is the final portion of the method's implementation:

    else:
      self._currDetectedObject = None
      if self._recognizerTrained:
        self._clearMessage()
      else:
        self._showInstructions()
 
    self._enableOrDisableUpdateModelButton()

The Add to Model button should be enabled only when a face is detected and the text field is non-empty. We can implement this logic in the following manner:

  def _enableOrDisableUpdateModelButton(self):
    labelAsStr = self._referenceTextCtrl.GetValue()
    if len(labelAsStr) < 1 or \
      self._currDetectedObject is None:
      self._updateModelButton.Disable()
    else:
      self._updateModelButton.Enable()

Like in Luxocator (the previous chapter's project), we will show a black image if the image is None; otherwise, we will convert the image from the OpenCV format and show it, as follows:

  def _showImage(self, image):
    if image is None:
      # Provide a black bitmap.
      bitmap = wx.EmptyBitmap(self._imageWidth,
        self._imageHeight)
    else:
      # Convert the image to bitmap format.
      bitmap = WxUtils.wxBitmapFromCvImage(image)
    # Show the bitmap.
    self._staticBitmap.SetBitmap(bitmap)

Since we will set the label's text under several different conditions, we will use the following helper functions to reduce the repetition of code:

  def _showInstructions(self):
    self._showMessage(
      'When an object is highlighted, type its name\n'
      '(max 4 chars) and click "Add to Model".')

  def _clearMessage(self):
    # Insert an endline for consistent spacing.
    self._showMessage('\n')

  def _showMessage(self, message):
    wx.CallAfter(self._predictionStaticText.SetLabel, message)

Note the use of the wx.CallAfter function to ensure that the label is updated on the main thread.

That is all the functionality of Interactive Recognizer. Now, we just need to write the main functions for the two variants of the app, starting with Interactive Human Face Recognizer. Just as we passed arguments to the initializer of InteractiveRecognizer, we will provide the app's title and PyInstaller-compatible paths to the relevant detection model and recognition model. We will run the app. Here is the implementation, which we can put in InteractiveHumanFaceRecognizer.py:

import wx

from InteractiveRecognizer import InteractiveRecognizer
import PyInstallerUtils


def main():
  app = wx.App()
  recognizerPath = PyInstallerUtils.resourcePath(
    'recognizers/lbph_human_faces.xml')
  cascadePath = PyInstallerUtils.resourcePath(
    # Uncomment the next argument for LBP.
    #'cascades/lbpcascade_frontalface.xml')
    # Uncomment the next argument for Haar.
    'cascades/haarcascade_frontalface_alt.xml')
  interactiveDetector = InteractiveRecognizer(
    recognizerPath, cascadePath,
    title='Interactive Human Face Recognizer')
  interactiveDetector.Show()
  app.MainLoop()

if __name__ == '__main__':
  main()

Remember that cascades/haarcascade_frontalface_alt.xml or cascades/lpbcascade_frontalface.xml needs to be obtained from OpenCV's samples or from this chapter's code bundle. Feel free to test Interactive Human Face Recognizer now!

Our second variant of the app, Interactive Cat Face Recognizer, uses very similar code. We will change the app's title and the paths of the detection and recognition models. Also, we will raise the minNeighbors value to 8 to make the detector a little more conservative. (Our cat face detection model turns out to be more prone to false positives than our human face detection model.) Here is the implementation, which we can put in InteractiveCatFaceRecognizer.py:

import wx

from InteractiveRecognizer import InteractiveRecognizer
import PyInstallerUtils


def main():
  app = wx.App()
  recognizerPath = PyInstallerUtils.resourcePath(
    'recognizers/lbph_cat_faces.xml')
  cascadePath = PyInstallerUtils.resourcePath(
    # Uncomment the next argument for LBP.
    #'cascades/lbpcascade_frontalcatface.xml')
    # Uncomment the next argument for Haar with basic
    # features.
    'cascades/haarcascade_frontalcatface.xml')
    # Uncomment the next argument for Haar with extended
    # features.
    #'cascades/haarcascade_frontalcatface_extended.xml')
  interactiveDetector = InteractiveRecognizer(
    recognizerPath, cascadePath,
    minNeighbors=8,
    title='Interactive Cat Face Recognizer')
  interactiveDetector.Show()
  app.MainLoop()

if __name__ == '__main__':
  main()

At this stage, Interactive Cat Face Recognizer will not run properly because cascades/haarcascade_frontalcatface.xml, cascades/haarcascade_frontalcatface_extended.xml, or cascades/lpbcascade_frontalcatface.xml does not exist (unless you copied the prebuilt version from this chapter's code bundle). OpenCV 2.x does not come with any cat detection model, but we will soon create our own! (OpenCV 3.0 and later versions will contain this book's cat detection models!).