Introduction to face detection and face recognition

Face recognition is the process of putting a label to a known face. Just like humans learn to recognize their family, friends, and celebrities just by seeing their face, there are many techniques for recognize a face in computer vision.

These generally involve four main steps, defined as follows:

Face detection: This is the process of locating a face region in an image (the large rectangle near the center of the following screenshot). This step does not care who the person is, just that it is a human face.
Face preprocessing: This is the process of adjusting the face image to look clearer and similar to other faces (a small grayscale face in the top center of the following screenshot).
Collecting and learning faces: This is a process of saving many preprocessed faces (for each person that should be recognized), and then learning how to recognize them.
Face recognition: This is the process that checks which of the collected people are most similar to the face in the camera (a small rectangle in the top right of the following screenshot).

Note that the phrase face recognition is often used by the general public to refer to finding the positions of faces (that is, face detection, as described in step 1), but this book will use the formal definition of face recognition referring to step 4, and face detection referring to Step 1.

The following screenshot shows the final WebcamFaceRec project, including a small rectangle at the top-right corner highlighting the recognized person. Also, notice the confidence bar that is next to the preprocessed face (a small face at the top center of the rectangle marking the face), which in this case shows roughly 70 percent confidence that it has recognized the correct person:

Current face detection techniques are quite reliable in real-world conditions, whereas current face recognition techniques are much less reliable when used in real-world conditions. For example, it is easy to find research papers showing face recognition accuracy rates above 95 percent, but when testing those same algorithms yourself, you may often find that accuracy is lower than 50 percent. This comes from the fact that current face recognition techniques are very sensitive to exact conditions in images, such as the type of lighting, direction of lighting and shadows, exact orientation of the face, expression of the face, and the current mood of the person. If they are all kept constant when training (collecting images), as well as when testing (from the camera image), then face recognition should work well, but if the person was standing to the left-hand side of the lights in a room when training, and then stood to the right-hand side while testing with the camera, it may give quite bad results. So, the dataset used for training is very important.

Face preprocessing aims to reduce these problems by making sure the face always appears to have similar brightness and contrast, and perhaps making sure the features of the face will always be in the same position (such as aligning the eyes and/or nose to certain positions). A good face preprocessing stage will help improve the reliability of the whole face recognition system, so this chapter will place some emphasis on face preprocessing methods.

Despite the big claims about using face recognition for security in the media, it is unlikely that the current face recognition methods alone are reliable enough for any true security system. However, they can be used for purposes that don't need high reliability, such as playing personalized music for different people entering a room, or a robot that says your name when it sees you. There are also various practical extensions to face recognition, such as gender recognition, age recognition, and emotion recognition.