Training the model

For training the model, we will use the architecture from the CIFAR-10 experiment:

arch = @mx.chain mx.Variable(:data) =>
        mx.Convolution(kernel=(3, 3), num_filter=32) =>
        mx.Activation(act_type=:relu) =>
        mx.Dropout(p = 0.25) => 
        mx.Pooling( kernel=(2, 2), pool_type=:max) =>
        mx.Flatten() =>
        mx.FullyConnected(num_hidden=256) =>
        mx.Activation(act_type=:relu) =>
        mx.FullyConnected(num_hidden=10) =>
        mx.SoftmaxOutput(mx.Variable(:label))

nnet = mx.FeedForward(arch, context = mx.cpu())
mx.fit(nnet, mx.ADAM(), train_data_provider, eval_data = validation_data_provider, n_epoch = 50, initializer = mx.XavierInitializer());

Running the model on CPU can take some time, so please be patient. If you have configured MXNet with GPU support, you are advised to change the context to mx.gpu.

We have achieved 80% accuracy, which is a good result for such a tiny network:

INFO: == Epoch 022/050 ==========
INFO: ## Training summary
INFO: accuracy = 0.9841
INFO: time = 2.4594 seconds
INFO: ## Validation summary
INFO: accuracy = 0.7958