Extracting features

Our next step is to extract the features. We will be using the Inception V3 network for this. Because the total number of images is relatively large, I will be using a GPU instance. You can try using a CPU, but it will take a longer to complete.

We will start by instantiating the model in the GPU context:

nnet = mx.load_checkpoint("weights/inception-v3/InceptionV3-FE", 0, mx.FeedForward; context = mx.gpu());

We will continue by creating a variable that will store features for the whole dataset. From the previous sections, you should remember that Inception V3 returns 2048 neurons or attributes per image:

features = zeros(2048, length(images))

Now, we can iterate over images and populate the features array. Because running Inception V3 is a memory-consuming task, we will be doing this in batches. Depending on the available memory, please adjust the batch size. For example, running the process on CPU and 8 GB of RAM might require that you decrease the batch_size_fe to 50 or 100.

The following process executes the flow:

The for loop creates an iterator with a step of batch_size.
row_count controls the number of rows, as the last run can have less records than batch_size.
mx_data_features is an MXNet array passed to the model.
The second for loop executes multiple steps which are combined in one-liners:
1. Read an image from the disk, resize it to 299 x 299, ensure that the colors are in RGB, and convert it into a three-dimensional representation
2. Change the order of the dimension to move the channel dimension to the end and reshape it to fit the MXNet array
Next, we normalize the dataset as required by the model.
Create the data provider, and predict and populate the features array for the records processed in the batch. This is shown in the following code:

batch_size_fe = 250
for idx = 1:batch_size:length(images)
    
    println(idx)

    row_count = min(length(images) + 1 - idx, batch_size_fe)
    mx_data_features = mx.zeros((299, 299, 3, row_count));

    for idx_ = 1:row_count
        img = channelview(RGB.(imresize(load(images[idx + idx_ - 1][1]), (299, 299))))
        mx_data_features[idx_:idx_] = reshape(Float16.(permuteddimsview(img, (3, 2, 1))), (299, 299, 3, 1));
    end

    mx_data_features *= 256.0;
    mx_data_features -= 128.;
    mx_data_features /= 128.;

    data_provider = mx.ArrayDataProvider(:data => mx_data_features);
    features[:, idx:idx + row_count - 1] = mx.predict(nnet, data_provider)
end

Depending on your setup, this process can take some time. I suggest that you use tmux if you are running it on a remote Linux machine.