From OpenCV 3.4, the deep learning module was available as a contrib source (https://github.com/opencv/opencv_contrib), but from version 4.0, deep learning is part of OpenCV core. This means that OpenCV deep learning is stable and in good maintenance.
We can use a pretrained Caffe model based on the SSD deep learning algorithm for faces. This algorithm allows us to detect multiple objects in an image in a single deep learning network, returning a class and bounding box per object detected.
To load the pretrained Caffe, model we need to load two files:
- Proto file or configuration model; in our case, the file is saved in data/deploy.prototxt
- Binary trained model, which has the weights of each variable; in our case, the file is saved in data/res10_300x300_ssd_iter_140000_fp16.caffemodel
The following code allows us to load the model into OpenCV:
dnn::Net net = readNetFromCaffe("data/deploy.prototxt", "data/res10_300x300_ssd_iter_14000_fp16.caffemodel");
After loading the deep learning network, per each frame that we capture with the webcam, we have to convert as a blob image that deep learning network can understand. We have to use the blobFromImage function as follows:
Mat inputBlob = blobFromImage(frame, 1.0, Size(300, 300), meanVal, false, false);
Where the first parameter is the input image, the second is a scaled factor for each pixel value, the third is the output spatial size, the fourth is a Scalar value to be subtracted from each channel, the fifth is a flag to swap the B and R channels, and the last parameter, and if we set the last parameter to true, it crops the image after resized.
Now, we have prepared the input image for the deep neural network; to set it to the net, we have to call the following function:
net.setInput(inputBlob);
Finally, we can call to network to predict as follows:
Mat detection = net.forward();