AlexNet for image classification

Image classification using the OpenCV DNN module by using AlexNet and Caffe pre-trained models is performed in the image_classification_opencv_alexnet_caffe.py script. The first step is to load the name of the classes. The second step is to load the serialized Caffe model from disk. The third step is to load the input image to classify. The fourth step is to create the blob with a size of (227, 2327) and the (104, 117, 123) mean subtraction values. The fifth step is to feed the input blob to the network, perform inference, and get the output. The sixth step is to get the 10 indexes with the highest probability (in descending order). This way, the index with the highest probability (top prediction) will be the first. Finally, we will draw the class and the probability associated with the top prediction on the image. The output of this script can be seen in the following screenshot:

As shown in the previous screenshot, the top prediction corresponds to a church with a probability of 0.8325679898.

The top 10 predictions are as follows:

1. label: church, probability: 0.8325679898
2. label: monastery, probability: 0.043678388
3. label: mosque, probability: 0.03827961534
4. label: bell cote, probability: 0.02479489893
5. label: beacon, probability: 0.01249620412
6. label: dome, probability: 0.01223050058
7. label: missile, probability: 0.006323920097
8. label: projectile, probability: 0.005275635514
9. label: palace, probability: 0.004289720673
10. label: castle, probability: 0.003241452388

It should also be noted that we perform the following when drawing the class and probability:

text = "label: {} probability: {:.2f}%".format(classes[indexes[0]], preds[0][indexes[0]] * 100)
print(text)
y0, dy = 30, 30
for i, line in enumerate(text.split('\n')):
    y = y0 + i * dy
    cv2.putText(image, line, (5, y), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 255), 2)

This way, the text can be split and drawn in different lines in the image. For example, if we execute the following code, the text will be drawn in two lines:

text = "label: {}\nprobability: {:.2f}%".format(classes[indexes[0]], preds[0][indexes[0]] * 100)

It should be noted that the bvlc_alexnet.caffemodel file is not included in the repository of this book because it exceeds GitHub's file size limit of 100.00 MB. You have to download it from http://dl.caffe.berkeleyvision.org/bvlc_alexnet.caffemodel.

Therefore, you have to download the bvlc_alexnet.caffemodel file before running the script.