Object detection using MobileNet-SSD

We will be using MobileNet-SSD network to detect objects such as cats, dogs, and cars in a photo. A combination of MobileNet and SSD gives outstanding results in terms of accuracy and speed in object detection activities. At the end of the section, you will be able to generate images containing bounding box and name of the object:

We always start the same, by loading Julia packages and defining path to opencv.pc:

ENV["PKG_CONFIG_PATH"] = "/Users/dc/anaconda/envs/python35/lib/pkgconfig"

using OpenCV
using Images, ImageView
using Cxx

The moment Julia packages are defined, we proceed to writing C++ code. Remember that C++ code is encapsulated within special syntax, as follows:

cxx"""
  <<C++ code goes here>>
"""

The first thing when starting to write code in C++ is to add all prerequisites, and we will have a number of them in the example:

#include <opencv2/dnn.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/highgui.hpp>

#include <fstream>
#include <iostream>
#include <cstdlib>

using namespace std;
using namespace cv;
using namespace cv::dnn;

Next, we will define a function to initialize our neural network. It will accept path for prototxt and caffemodel and preload the model to memory using the readNetFromCaffe function:

Net load_model(String caffe_model_txt, String caffe_model_bin) {
 
    Net net = dnn::readNetFromCaffe(caffe_model_txt, caffe_model_bin);

    if (net.empty()) {
        std::cerr << "Can't load network." << std::endl;
        exit(-1);
    }

    return net;
}

Next, we will proceed to the main program logic, that is, running neural network and processing the results.

We will execute the following set of actions:

Accept image and neural network as input parameters.
Define possible outcomes/classes.
Prepare image for classification. MobileNet-SSD requires images to be 300x300 pixels large.
Run the neural network and collect the results.
Go over the results and remove the ones that have a confidence lower than a specific threshold.
Draw bounding boxes around confident results.

This is shown with the following code:

void detect_objects(Mat img, Net net) {

 string CLASS_NAMES[] = {"background", "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"};

    // prepare image for evaluation
    Mat scaled_image;
    resize(img, scaled_image, Size(300,300));
    scaled_image = blobFromImage(scaled_image, 0.007843, Size(300,300), 
     Scalar(127.5, 127.5, 127.5), false);

    // run the network
    net.setInput(scaled_image, "data");
    Mat detection_out = net.forward("detection_out");
    Mat results(detection_out.size[2], detection_out.size[3], CV_32F, 
    detection_out.ptr<float>());
    
    // draw bounding boxes if the probability is over a specific 
    threshold
    float threshold = 0.5;
    for (int i = 0; i < results.rows; i++) {

        float prob = results.at<float>(i, 2);

        if (prob > threshold) {
            int class_idx = static_cast<int>(results.at<float>(i, 1));
            int xLeftBottom = static_cast<int>(results.at<float>(i, 3) 
            * img.cols);
            int yLeftBottom = static_cast<int>(results.at<float>(i, 4) 
            * img.rows);
            int xRightTop = static_cast<int>(results.at<float>(i, 5) * 
            img.cols);
            int yRightTop = static_cast<int>(results.at<float>(i, 6) * 
            img.rows);

            String label = CLASS_NAMES[class_idx] + ": " + 
           std::to_string(prob);

            Rect bounding_box((int)xLeftBottom, (int)yLeftBottom, (int)
           (xRightTop - xLeftBottom), (int)(yRightTop - yLeftBottom));
            rectangle(img, bounding_box, Scalar(0, 255, 0), 2);
            putText(img, label, Point(xLeftBottom, yLeftBottom), 
            FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0,0,0));
        }
    }
}

Now when the C++ code is ready, we are ready to switch to Julia code. We start by downloading models weights as described in the documentation. Normally, these weights are located in the official GitHub repository at https://github.com/chuanqi305/MobileNet-SSD.

Consider the following code:

# source: https://github.com/opencv/opencv_extra/blob/master/testdata/dnn/download_models.py

caffemodel_path = joinpath("data", "MobileNetSSD_deploy.caffemodel")
if ~isfile(caffemodel_path) download("https://drive.google.com/uc?export=download&id=0B3gersZ2cHIxRm5PMWRoTkdHdHc", caffemodel_path) end

prototxt_path = joinpath("data", "MobileNetSSD_deploy.prototxt")
if ~isfile(prototxt_path) download("https://raw.githubusercontent.com/chuanqi305/MobileNet-SSD/master/MobileNetSSD_deploy.prototxt", prototxt_path) end

It can take some time to download the models. We will download them once and store them in the data folder.

The moment weights are ready, we are good to go and initialize our neural network:

opencv_dnn_model = @cxx load_model(pointer(prototxt_path), pointer(caffemodel_path));

Now, we can start testing the models in different scenarios. Let's cover some of them in the following sections.

We will repeat the same actions for different images. We read the image to Open CV format using the imread function. We call detect_objects to draw the bounding boxes and save the result using the imwrite function:

filename = joinpath(pwd(), "sample-images", "cat-3352842_640.jpg");
img_opencv = imread(filename);
@cxx detect_objects(img_opencv, opencv_dnn_model);
imwrite(joinpath(pwd(), "object-detection-1.jpg"), img_opencv)

filename = joinpath(pwd(), "sample-images", "bird-3183441_640.jpg");
img_opencv = imread(filename);
@cxx detect_objects(img_opencv, opencv_dnn_model);
imwrite(joinpath(pwd(), "object-detection-2.jpg"), img_opencv)

filename = joinpath(pwd(), "sample-images", "kittens-555822_640.jpg");
img_opencv = imread(filename);
@cxx detect_objects(img_opencv, opencv_dnn_model);
imwrite(joinpath(pwd(), "object-detection-3.jpg"), img_opencv)

You should have three new images in the root folder, each corresponding to one of the images we analyzed. You can see that despite of the different content and number of objects, the network worked very well:

It is possible to retrain the model using custom dataset and number of classes. Refer to the Mobilenet-SSD page for more details:

https://github.com/chuanqi305/MobileNet-SSD