Let's write a program to detect a human face. I have named this program FaceDetection.cpp and you can download it from the Chapter08 folder of this book's GitHub repository.
Since we will be using haarcascade_frontalface_alt2.xml to detect faces, please make sure that the FaceDetection.cpp and haarcascade_frontalface_alt2.xml files are in the same folder.
To program face detection, follow these steps:
- In the FaceDetection.cpp program, load the Haar's pre-trained frontal face XML using the CascadeClassifier class, as shown in the following code snippet:
CascadeClassifier faceDetector("haarcascade_frontalface_alt2.xml");
- Declare two matrix variables, called videofeed and grayfeed, along with a VideoCapture variable, called vid(0), to capture footage from the RPi camera:
Mat videofeed, grayfeed;
VideoCapture vid(0);
- Inside the for loop, read the camera feed. Then, flip the camera feed horizontally. Using the cvtColor function, we can convert our videofeed into grayscale. If your Pi camera is placed upside-down, set the third parameter inside flip function to 0. The grayscale output is stored in the grayfeed variable. The following code shows how to complete this step:
vid.read(videofeed);
flip(videofeed, videofeed, 1);
cvtColor(videofeed, grayfeed, COLOR_BGR2GRAY);
- Let's perform a histogram equalization to improve the brightness and contrast of videofeed. Histogram equalization is required because sometimes, in low lighting, the camera may not be able to detect the face. To perform histogram equalization, we will use the equalizeHist function:
equalizeHist(grayfeed, grayfeed);
- Let's detect some faces. For this, the detectMultiScale function is used, as follows:
detectMultiScale(image, object, scalefactor, min neighbors,flags, min size, max size);
The detectMultiScale function that's shown in the preceding code snippet consists of the following seven parameters:
-
- image: Represents the input video feed. In our case, it is grayfeed, as we will detect the face from the grayscale video.
- object: Represents the vectors of a rectangle where each rectangle contains the detected faces.
- scalefactor: Specifies how much the image size must be reduced. The ideal value of scale factor is between 1.1 and 1.3.
- flags: This parameter can be set to CASCADE_SCALE_IMAGE, CASCADE_FIND_BIGGEST_OBJECT, CASCADE_DO_ROUGH_SEARCH, or CASCADE_DO_CANNY_PRUNING:
- CASCADE_SCALE_IMAGE: This is the most popular flag; it informs the classifier that the Haar features for detecting the face are applied to the video or image
- CASCADE_FIND_BIGGEST_OBJECT: This flag will tell the classifier to find the biggest face in the image or video
- CASCADE_DO_ROUGH_SEARCH: This flag will stop the classifier once a face is detected
- CASCADE_DO_CANNY_PRUNNING: This flag informs the classifier to not detect sharp edges, thus increasing the chances of face detection
- min neighbors: The minimum neighbors parameter affects the quality of the detected faces. Higher min neighbor values will recognize fewer faces, but whatever it detects will definitely be a face. Lower min neighbors values may recognize multiple faces, but sometimes it may also recognize objects that are not faces. The ideal min neighbors values for detecting faces is between 3 and 5.
- min size: The minimum size parameter will detect the minimum face size. For example, if we set the min size to 50 x 50 pixels, the classifier will only detect faces that are bigger than 50 x 50 pixels and ignore faces that are lower than 50 x 50 pixels. Ideally, we can set the min size to 30 x 30 pixels.
- max size: The maximum size parameter will detect the maximum face size. For example, if we set the max size to 80 x 80 pixels, the classifier will only detect faces that are smaller than 80 x 80 pixels. So, if you move too close to the camera and your face size exceeds the max size, your face will not be detected by the classifier.
- Since the detectMultiScale function provides a vector of rectangles as its output, we have to declare a vector as the Rect type. The variable name as face. scalefactor is set to 1.1, min neighbors is set to 5, and the minimum scale size is set 30 x 30 pixels. The max size is ignored here because if your face size becomes bigger than the max size, your face will not be detected. To complete this step, use the following code:
vector<Rect> face;
faceDetector.detectMultiScale(grayfeed, faces, 1.3, 5, 0 | CASCADE_SCALE_IMAGE, Size(30, 30));
After detecting faces, we will create a rectangle around the detected faces and display text on the top-left side of the rectangle that states "Face detected":
for (size_t f = 0; f < face.size(); f++)
{
rectangle(videofeed, face[f], Scalar(255, 0, 0), 2);
putText(videofeed, "Face Detected", Point(face[f].x, face[f].y), FONT_HERSHEY_PLAIN, 1.0, Scalar(0, 255, 0), 2.0);
}
Inside the for loop, we are determining how many faces are detected using the face.size() function. If one is detected, face.size() equals 1, and the for loop will be satisfied. Inside the for loop, we have the rectangle and putText function.
The rectangle function will create a rectangle around the detected face. It consists of four parameters:
- The first parameter represents the image or video feed on which we want to draw the rectangle, which in our case is videofeed
- The second parameter of face[f] represents the detected face on which we have to draw the rectangle
- The third parameter represents the color of the rectangle (for this example, we have set the color to blue)
- The fourth and final parameter represents the thickness of the rectangle
The putText function is used to display text in an image or video feed. It consists of seven parameters:
- The first parameter represents the image or video feed on which we want to draw the rectangle.
- The second parameter represents the text message that we want to display.
- The third parameter represents the point on which we want the text to be displayed. The face[f].x and face[f].y functions represent the top-left point of the rectangle, so the text will be displayed on the top-left side of the rectangle.
- The fourth parameter represents the font type, which we have set to FONT_HERSHEY_PLAIN.
- The fifth parameter represents the font size of the text, which we have set to 1.
- The sixth parameter represents the color of the text, which is set to green (Scalar(0,255,0)).
- The seventh and final parameter represents the thickness of the font, which is set to 1.0.
Finally, using the imshow function, we will view the video feed, along with the rectangle and text:
imshow("Face Detection", videofeed);
After using the preceding code, and if you have compiled and built the program, you will see that a rectangle has been drawn around the detected face:
Next, we will detect human eyes as well as recognize a smile. Once the eyes and smile have been recognized, we will create circles around them.