Base model (no convolutional layers)

Now that we have explored the data and we are satisfied that it looks OK, the next step is to create our first deep learning model. This is similar to the example we saw in the previous chapter. The code for this is in Chapter5/mnist.Rmd:

data <- mx.symbol.Variable("data")
fullconnect1 <- mx.symbol.FullyConnected(data, name="fullconnect1", num_hidden=256)
activation1 <- mx.symbol.Activation(fullconnect1, name="activation1", act_type="relu")
fullconnect2 <- mx.symbol.FullyConnected(activation1, name="fullconnect2", num_hidden=128)
activation2 <- mx.symbol.Activation(fullconnect2, name="activation2", act_type="relu")
fullconnect3 <- mx.symbol.FullyConnected(activation2, name="fullconnect3", num_hidden=10)
softmax <- mx.symbol.SoftmaxOutput(fullconnect3, name="softmax")

Let's look at this code in detail:

In mxnet, we use its own data type symbol to configure the network.
We create the first hidden layer (fullconnect1 <- ....). This parameters are the data as input, the layer's name and the number of neurons in the layer.
We apply an activation function to the fullconnect layer (activation1 <- ....). The mx.symbol.Activation function takes the output from the first hidden layer, fullconnect1.
The second hidden layer (fullconnect1 <- ....) takes activation1 as the input.
The second activation is similar to activation1.
The fullconnect3 is the output layer. This layer has 10 neurons because this is a multi-classification problem and there are 10 classes.
Finally, we use a softmax activation to get a probabilistic prediction for each class.

Now, let's train the base model. I have a GPU installed, so I can use that. You may need to change the line to devices <- mx.cpu():

devices <- mx.gpu()
mx.set.seed(0)
model <- mx.model.FeedForward.create(softmax, X=train, y=train.y,
                                     ctx=devices,array.batch.size=128,
                                     num.round=10,
                                     learning.rate=0.05, momentum=0.9,
                                     eval.metric=mx.metric.accuracy,
                                     epoch.end.callback=mx.callback.log.train.metric(1))

To make a prediction, we will call the predict function. We can then create a confusion matrix and calculate our accuracy level on test data:

preds1 <- predict(model, test)
pred.label1 <- max.col(t(preds1)) - 1
res1 <- data.frame(cbind(test.y,pred.label1))
table(res1)
##      pred.label1
## test.y   0   1   2   3   4   5   6   7   8   9
##      0 405   0   0   1   1   2   1   1   0   5
##      1   0 449   1   0   0   0   0   4   0   1
##      2   0   0 436   0   0   0   0   3   1   1
##      3   0   0   6 420   0   1   0   2   8   0
##      4   0   1   1   0 388   0   2   0   1   7
##      5   2   0   0   6   1 363   3   0   2   5
##      6   3   1   3   0   2   1 427   0   0   0
##      7   0   2   3   0   1   0   0 394   0   3
##      8   0   4   2   4   0   2   1   1 403   6
##      9   1   0   1   2   7   0   1   1   0 393

accuracy1 <- sum(res1$test.y == res1$pred.label1) / nrow(res1)
accuracy1
## 0.971

The accuracy of our base model is 0.971. Not bad, but let's see if we can improve on it.