Face detection with dlib

You can use dlib.get_frontal_face_detector() to create a frontal face detector, which is based on Histogram of Oriented Gradients (HOG) features and a linear classifier in a sliding window detection approach. In particular, the HOG trainer uses a structural SVM-based training algorithm that enables the trainer to train in all the sub-windows in every training image. This face detector has been trained using 3,000 images from the Labeled Faces in the Wild (http://vis-www.cs.umass.edu/lfw/) dataset. It should be noted that this detector can also be used to spot objects other than faces. You can check out the train_object_detector.py script, which is included in the dlib library (http://dlib.net/train_object_detector.py.html), to see how to easily train your own object detectors using only a few training images. For example, you can train a great stop-sign detector using only eight images of stop signs.

The face_detection_dlib_hog.py script detects faces using the aforementioned dlib frontal face detector. The first step is to load the frontal face detector from dlib:

detector = dlib.get_frontal_face_detector()

The next step is to perform the detection:

rects_1 = detector(gray, 0)
rects_2 = detector(gray, 1)

The second argument indicates that the image is upsampled 1 time before the detection process is carried out, allowing the detector to detect more faces because the image is bigger. On the contrary, the execution time will be increased. Therefore, this should be taken into account for performance purposes.

The output of this script can be seen in the following screenshot:

As you can see, if we detect faces using the original grayscale image (rects_1 = detector(gray, 0)), only two faces are found. However, if we detect faces using the grayscale image upsampled 1 time (rects_2 = detector(gray, 1)), the three faces are correctly detected.

The dlib library also offers a CNN face detector. You can use dlib.cnn_face_detection_model_v1() to create the CNN face detector. The constructor loads the face detection model from a file. You can download a pre-trained model (712 KB) from http://dlib.net/files/mmod_human_face_detector.dat.bz2. When creating the CNN face detector, the corresponding pre-trained model should be passed to this method:

cnn_face_detector = dlib.cnn_face_detection_model_v1("mmod_human_face_detector.dat")

At this point, we are ready to spot faces using this detector:

rects = cnn_face_detector(img, 0)

This detector spots a mmod_rectangles object, which is a list of mmod_rectangle objects, and the mmod_rectangle object has two member variables—a dlib.rectangle object, and a confidence score. Therefore, to show the detections, the show_detection() function is coded:

def show_detection(image, faces):
    """Draws a rectangle over each detected face"""

    # faces contains a list of mmod_rectangle objects
    # The mmod_rectangle object has two member variables, a dlib.rectangle object, and a confidence score
    # Therefore, we iterate over the detected mmod_rectangle objects accessing dlib.rect to draw the rectangle

    for face in faces:
        cv2.rectangle(image, (face.rect.left(), face.rect.top()), (face.rect.right(), face.rect.bottom()), (255, 0, 0), 10)
    return image

The show_detection() function should be called like this:

img_faces = show_detection(img.copy(), rects)

The full code is in the face_detection_dlib_cnn.py script. The output of this script can be seen in the next screenshot:

The dlib CNN face detector is much more accurate than the dlib HOG face detector, but it takes much more computational power to run. For example, for a 600 x 400 image, the HOG face detector takes around 0.25 seconds, while CNN face detector takes around 5. Indeed, the CNN face detector is meant to be executed on a GPU in order to attain a reasonable speed.

If you have GPU, you can enable CUDA, which should speed up the execution. To do so, you will need to compile dlib from source.