Importing YOLO into OpenCV

The deep learning OpenCV module is found under the opencv2/dnn.hpp header, which we have to include in our source header and in cv::dnn namespace.

Then our header for OpenCV must look like this:

...
#include <opencv2/core.hpp>
#include <opencv2/dnn.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/highgui.hpp>
using namespace cv;
using namespace dnn;
...

The first thing we have to do is import the COCO name's vocabulary, which is in the coco.names file. This file is a plaintext file that contains one class category per line, and is ordered in the same way as the confidence results. Then we are going to read each line of this file and store it in a vector of strings, called classes:

...
 int main(int argc, char** argv)
 {
     // Load names of classes
     string classesFile = "coco.names";
     ifstream ifs(classesFile.c_str());
     string line;
     while (getline(ifs, line)) classes.push_back(line);
     ...

Now we are going to import the deep learning model into OpenCV. OpenCV implements the most common readers/importers for deep learning frameworks, such as TensorFlow and DarkNet, and all of them have a similar syntax. In our case, we are going to import a DarkNet model using the weights, and the model using the readNetFromDarknet OpenCV function:

...
 // Give the configuration and weight files for the model
 String modelConfiguration = "yolov3.cfg";
 String modelWeights = "yolov3.weights";
// Load the network
Net net = readNetFromDarknet(modelConfiguration, modelWeights);
...

Now we are in a position to read an image and send the deep neural network to inference. First we have to read an image with the imread function and convert it into a tensor/blob data that can read the DotNetNuke (DNN). To create the blob from an image, we are going to use the blobFromImage function by passing the image. This function accepts the following parameters:

image: Input image (with 1, 3, or 4 channels).
blob: Output mat.
scalefactor: Multiplier for image values.
size: Spatial size for output blob required as input of DNN.
mean: Scalar with mean values that are subtracted from channels. Values are intended to be in (mean-R, mean-G, and mean-B) order if the image has BGR ordering and swapRB is true.
swapRB: A flag that indicates to swap the first and last channels in a 3-channel image is necessary.
crop: A flag that indicates whether the image will be cropped after resize.

You can read the full code on how to read and convert an image into a blob in the following snippet:

...
input= imread(argv[1]);
// Stop the program if reached end of video
if (input.empty()) {
    cout << "No input image" << endl;
    return 0;
}
// Create a 4D blob from a frame.
blobFromImage(input, blob, 1/255.0, Size(inpWidth, inpHeight), Scalar(0,0,0), true, false);
...

Finally, we have to feed the blob into Deep Net and call the inference with the forward function, which requires two parameters: the out mat results, and the names of the layers that the output needs to retrieve:

...
//Sets the input to the network
net.setInput(blob);
 
// Runs the forward pass to get output of the output layers
vector<Mat> outs;
net.forward(outs, getOutputsNames(net));
// Remove the bounding boxes with low confidence
postprocess(input, outs);
...

In the mat output vector, we have all bounding boxes detected by the neural network and we have to post-process the output to get only the results that have a confidence greater than a threshold, normally 0.5, and finally apply non-maximum suppression to eliminate redundant overlapping boxes. You can get the full post-process code on GitHub.

The final result of our example is multiple-object detection and classification in deep learning that shows a window similar to the following:

Now we are going to learn another commonly-used object detection function customized for face detection.