Our next step is to extract the features. We will be using the Inception V3 network for this. Because the total number of images is relatively large, I will be using a GPU instance. You can try using a CPU, but it will take a longer to complete.
We will start by instantiating the model in the GPU context:
nnet = mx.load_checkpoint("weights/inception-v3/InceptionV3-FE", 0, mx.FeedForward; context = mx.gpu());
We will continue by creating a variable that will store features for the whole dataset. From the previous sections, you should remember that Inception V3 returns 2048 neurons or attributes per image:
features = zeros(2048, length(images))
Now, we can iterate over images and populate the features array. Because running Inception V3 is a memory-consuming task, we will be doing this in batches. Depending on the available memory, please adjust the batch size. For example, running the process on CPU and 8 GB of RAM might require that you decrease the batch_size_fe to 50 or 100.
The following process executes the flow:
- The for loop creates an iterator with a step of batch_size.
- row_count controls the number of rows, as the last run can have less records than batch_size.
- mx_data_features is an MXNet array passed to the model.
- The second for loop executes multiple steps which are combined in one-liners:
- Read an image from the disk, resize it to 299 x 299, ensure that the colors are in RGB, and convert it into a three-dimensional representation
- Change the order of the dimension to move the channel dimension to the end and reshape it to fit the MXNet array
- Next, we normalize the dataset as required by the model.
- Create the data provider, and predict and populate the features array for the records processed in the batch. This is shown in the following code:
batch_size_fe = 250
for idx = 1:batch_size:length(images)
println(idx)
row_count = min(length(images) + 1 - idx, batch_size_fe)
mx_data_features = mx.zeros((299, 299, 3, row_count));
for idx_ = 1:row_count
img = channelview(RGB.(imresize(load(images[idx + idx_ - 1][1]), (299, 299))))
mx_data_features[idx_:idx_] = reshape(Float16.(permuteddimsview(img, (3, 2, 1))), (299, 299, 3, 1));
end
mx_data_features *= 256.0;
mx_data_features -= 128.;
mx_data_features /= 128.;
data_provider = mx.ArrayDataProvider(:data => mx_data_features);
features[:, idx:idx + row_count - 1] = mx.predict(nnet, data_provider)
end
Depending on your setup, this process can take some time. I suggest that you use tmux if you are running it on a remote Linux machine.