Understanding cv2.dnn.blobFromImage()

In Chapter 11, Face Detection, Tracking, and Recognition, we have seen some examples involving deep learning computation. For example, in the face_detection_opencv_dnn.py script, a deep-learning based face detector (https://github.com/opencv/opencv/tree/master/samples/dnn/face_detector) was used to detect faces in images. The first step was to load pre-trained models as follows:

net = cv2.dnn.readNetFromCaffe("deploy.prototxt", "res10_300x300_ssd_iter_140000_fp16.caffemodel")

As a reminder, the deploy.prototxt file defines the model architecture, and the res10_300x300_ssd_iter_140000_fp16.caffemodel file contains the weights for the actual layers. In order to perform a forward pass for the whole network to compute the output, the input to the network should be a blob. The blob can be seen as a collection of images that have been adequately preprocessed to be fed to the network.

This pre-processing is composed of several operations – resizing, cropping, subtracting mean values, scaling, and swapping blue and red channels.

For example, in the aforementioned face detection example, we performed the following command:

# Load image:
image = cv2.imread("test_face_detection.jpg")

# Create 4-dimensional blob from image:
blob = cv2.dnn.blobFromImage(image, 1.0, (300, 300), [104., 117., 123.], False, False)

In this case, this means we want to run the model on BGR images resized to 300 x 300, applying a mean subtraction of (104, 117, 123) values for the blue, green, and red channels, respectively. This can be summarized in the following table:

Model	Scale	Size WxH	Mean subtraction	Channels order
OpenCV face detector	1.0	300 x 300	`104`, `177`, `123`	BGR

At this point, we can set the blob as input and obtain the detections as follows:

# Set the blob as input and obtain the detections:
net.setInput(blob)
detections = net.forward()

See the face_detection_opencv_dnn.py script for further details.

Now, we are going to see the cv2.dnn.blobFromImage() and cv2.dnn.blobFromImages() functions in detail. In order to do so, we are first going to see the signature of both functions, and we are going to see the blob_from_image.py and blob_from_images.py scripts. These scripts can be helpful when understanding these functions. Additionally, in these scripts, we are also going to make use of the OpenCV cv2.dnn.imagesFromBlob() function.

The signature of cv2.dnn.blobFromImage() is as follows:

retval=cv2.dnn.blobFromImage(image[, scalefactor[, size[, mean[, swapRB[, crop[, ddepth]]]]]])

This function creates a four-dimensional blob from image. Additionally, it optionally resizes the image to size, and crops the input image from center, subtracts mean values, scales values by scalefactor, and swaps blue and red channels:

image: This is the input image to preprocess.
scalefactor: This is a multiplier for image values. This value can be used to scale our images. The default value is 1.0, which means that no scaling is performed.
size: This is the spatial size for the output image.
mean: This is the scalar with mean values subtracted from the image. If you are performing mean subtraction, the values are intended to be (mean-R, mean-G, mean-B) when utilizing swapRB =True.
swapRB: This flag can be used to swap the R and B channels in the image by setting this flag to True.
crop: This is a flag that indicates whether the image will be cropped after resizing or not.
ddepth: The depth of the output blob. You can choose between CV_32F or CV_8U.
If crop=False, the resize of the image is performed without cropping. Otherwise, if (crop=True), the resize is applied first and then, the image is cropped from the center.
Default values are scalefactor=1.0, size = Size(), mean = Scalar(), swapRB = false, crop = false, and ddepth = CV_32F.

The signature of cv.dnn.blobFromImages() is as follows:

retval=cv.dnn.blobFromImages(images[, scalefactor[, size[, mean[, swapRB[, crop[, ddepth]]]]]])

This function creates a four-dimensional blob from multiple images. This way, you can perform a forward pass for the whole network to compute the output of several images at once. The following code shows you how to use this function properly:

# Create a list of images:
images = [image, image2]

# Call cv2.dnn.blobFromImages():
blob_images = cv2.dnn.blobFromImages(images, 1.0, (300, 300), [104., 117., 123.], False, False)

# Set the blob as input and obtain the detections:
net.setInput(blob_images)
detections = net.forward()

At this point, we have introduced the cv2.dnn.blobFromImage() and cv2.dnn.blobFromImages() functions. So, we are ready to see the blob_from_image.py and blob_from_images.py scripts.

In the blob_from_image.py script, we first load a BGR image, and create a four-dimensional blob making use of the cv2.dnn.blobFromImage() function. You can check that the shape of the created blob is (1, 3, 300, 300). Then, we call the get_image_from_blob() function, which can be used to perform the inverse preprocessing transformations in order to get the input image again. This way, you will get a better understanding of this preprocessing. The code of the get_image_from_blob function is as follows:

def get_image_from_blob(blob_img, scalefactor, dim, mean, swap_rb, mean_added):
    """Returns image from blob assuming that the blob is from only one image""
    images_from_blob = cv2.dnn.imagesFromBlob(blob_img)
    image_from_blob = np.reshape(images_from_blob[0], dim) / scalefactor
    image_from_blob_mean = np.uint8(image_from_blob)
    image_from_blob = image_from_blob_mean + np.uint8(mean)

    if mean_added is True:
        if swap_rb:
            image_from_blob = image_from_blob[:, :, ::-1]
        return image_from_blob
    else:
        if swap_rb:
            image_from_blob_mean = image_from_blob_mean[:, :, ::-1]
        return image_from_blob_mean

In the script, we make use of this function to get different images from the blob, as demonstrated in the following code snippet:

# Load image:
image = cv2.imread("face_test.jpg")

# Call cv2.dnn.blobFromImage():
blob_image = cv2.dnn.blobFromImage(image, 1.0, (300, 300), [104., 117., 123.], False, False)

# The shape of the blob_image will be (1, 3, 300, 300):
print(blob_image.shape)

# Get different images from the blob:
img_from_blob = get_image_from_blob(blob_image, 1.0, (300, 300, 3), [104., 117., 123.], False, True)
img_from_blob_swap = get_image_from_blob(blob_image, 1.0, (300, 300, 3), [104., 117., 123.], True, True)
img_from_blob_mean = get_image_from_blob(blob_image, 1.0, (300, 300, 3), [104., 117., 123.], False, False)
img_from_blob_mean_swap = get_image_from_blob(blob_image, 1.0, (300, 300, 3), [104., 117., 123.], True, False)

The created images are explained as follows:

The img_from_blob image corresponds to the original BGR image resized to (300,300).
The img_from_blob_swap image corresponds to the original BGR image resized to (300,300), and the blue and red channels have been swapped.
The img_from_blob_mean image corresponds to the original BGR image resized to (300,300), where the scalar with mean values has not been added to the image.
The img_from_blob_mean_swap image corresponds to the original BGR image resized to (300,300), where the scalar with mean values has not been added to the image and the blue and red channels have been swapped.

The output of this script can be seen in the following screenshot:

In the previous screenshot, we can see the four obtained (img_from_blob, img_from_blob_swap, img_from_blob_mean, and img_from_blob_mean_swap) images.

In the blob_from_images.py script, we first load two BGR images, create a four-dimensional blob making use of the cv2.dnn.blobFromImages() function. You can check that the shape of the created blob is (2, 3, 300, 300). Then, we call the get_images_from_blob() function, which can be used to perform the inverse pre-processing transformations in order to get the input images again.

The code for the get_images_from_blob function is as follows:

def get_images_from_blob(blob_imgs, scalefactor, dim, mean, swap_rb, mean_added):
    """Returns images from blob"""

    images_from_blob = cv2.dnn.imagesFromBlob(blob_imgs)
    imgs = []

    for image_blob in images_from_blob:
        image_from_blob = np.reshape(image_blob, dim) / scalefactor
        image_from_blob_mean = np.uint8(image_from_blob)
        image_from_blob = image_from_blob_mean + np.uint8(mean)
        if mean_added is True:
            if swap_rb:
                image_from_blob = image_from_blob[:, :, ::-1]
            imgs.append(image_from_blob)
        else:
            if swap_rb:
                image_from_blob_mean = image_from_blob_mean[:, :, ::-1]
            imgs.append(image_from_blob_mean)

    return imgs

As previously shown, the get_images_from_blob() function returns the images from the blob making use of the OpenCV cv2.dnn.imagesFromBlob() function. In the script, we make use of this function to get different images from the blob as follows:

# Load images and get the list of images:
image = cv2.imread("face_test.jpg")
image2 = cv2.imread("face_test_2.jpg")
images = [image, image2]

# Call cv2.dnn.blobFromImages():
blob_images = cv2.dnn.blobFromImages(images, 1.0, (300, 300), [104., 117., 123.], False, False)
# The shape of the blob_image will be (2, 3, 300, 300):
print(blob_images.shape)

# Get different images from the blob:
imgs_from_blob = get_images_from_blob(blob_images, 1.0, (300, 300, 3), [104., 117., 123.], False, True)
imgs_from_blob_swap = get_images_from_blob(blob_images, 1.0, (300, 300, 3), [104., 117., 123.], True, True)
imgs_from_blob_mean = get_images_from_blob(blob_images, 1.0, (300, 300, 3), [104., 117., 123.], False, False)
imgs_from_blob_mean_swap = get_images_from_blob(blob_images, 1.0, (300, 300, 3), [104., 117., 123.], True, False)

In the previous code, we make use of the get_images_from_blob() function to get different images from the blob. The created images are explained as follows:

The imgs_from_blob images correspond to the original BGR images resized to (300,300).
The imgs_from_blob_swap images correspond to the original BGR images resized to (300,300), and the blue and red channels have been swapped.
The imgs_from_blob_mean images correspond to the original BGR images resized to (300,300), where the scalar with mean values has not been added to the image.
The imgs_from_blob_mean_swap images correspond to the original BGR images resized to (300,300), where the scalar with mean values has not been added to the image and the blue and red channels have been swapped.

The output of this script can be seen in the following screenshot:

One final consideration with both cv2.dnn.blobFromImage() and cv2.dnn.blobFromImages() is the crop parameter, which indicates whether the image is cropped. In the case of cropping, the image is cropped from the center, as indicated in the following screenshot:

As you can see, the cropping is performed from the center of the image, indicated by the yellow line. To replicate the cropping that OpenCV performs inside the cv2.dnn.blobFromImage() and cv2.dnn.blobFromImages() functions, we have coded the get_cropped_img() function as follows:

def get_cropped_img(img):
    """Returns the cropped image"""

    # calculate size of resulting image:
    size = min(img.shape[1], img.shape[0])

    # calculate x1, and y1
    x1 = int(0.5 * (img.shape[1] - size))
    y1 = int(0.5 * (img.shape[0] - size))

    # crop and return the image
    return img[y1:(y1 + size), x1:(x1 + size)]

As you can see, the size of the cropped image will be based on the minimum dimension of the original image. Therefore, in the previous example, the cropped image will have a size of (482, 482).

In the blob_from_images_cropping.py script, we see the effect of cropping, and we also replicate the cropping procedure in the get_cropped_img() function:

# Load images and get the list of images:
image = cv2.imread("face_test.jpg")
image2 = cv2.imread("face_test_2.jpg")
images = [image, image2]

# To see how cropping works, we are going to perform the cropping formulation that
# both blobFromImage() and blobFromImages() perform applying it to one of the input images:
cropped_img = get_cropped_img(image)
# cv2.imwrite("cropped_img.jpg", cropped_img)

# Call cv2.dnn.blobFromImages():
blob_images = cv2.dnn.blobFromImages(images, 1.0, (300, 300), [104., 117., 123.], False, False)
blob_blob_images_cropped = cv2.dnn.blobFromImages(images, 1.0, (300, 300), [104., 117., 123.], False, True)

# Get different images from the blob:
imgs_from_blob = get_images_from_blob(blob_images, 1.0, (300, 300, 3), [104., 117., 123.], False, True)
imgs_from_blob_cropped = get_images_from_blob(blob_blob_images_cropped, 1.0, (300, 300, 3), [104., 117., 123.], False, True)

The output of the blob_from_images_cropping.py script can be seen in the following screenshot:

The effect of cropping in the two loaded images can be seen, and we can also appreciate that the aspect ratio is maintained.