Selecting optimal epochs using dropout and early stopping

To avoid overfitting, we can use two techniques. The first is adding a dropout layer. The dropout layer will remove a subset of the layer output. This makes the data a little different at every iteration so that the model generalizes better and doesn't fit the solution too specifically to the training data. In the preceding code, we add the dropout layer after the pooling layer:

model <- keras_model_sequential()
model %>%
layer_conv_2d(filters = 128, kernel_size = c(7,7), input_shape = c(28,28,1), padding = "same") %>%
layer_activation_leaky_relu() %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_dropout(rate = 0.2) %>%
layer_conv_2d(filters = 64, kernel_size = c(7,7), padding = "same") %>%
layer_activation_leaky_relu() %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_dropout(rate = 0.2) %>%
layer_conv_2d(filters = 32, kernel_size = c(7,7), padding = "same") %>%
layer_activation_leaky_relu() %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_dropout(rate = 0.2) %>%
layer_flatten() %>%
layer_dense(units = 128) %>%
layer_activation_leaky_relu() %>%
layer_dropout(rate = 0.2) %>%
layer_dense(units = 10, activation = 'softmax')

# compile model
model %>% compile(
loss = 'categorical_crossentropy',
optimizer = 'adam',
metrics = 'accuracy'
)
# train and evaluate
model %>% fit(
train, train_target,
batch_size = 100,
epochs = 5,
verbose = 1,
validation_data = list(test, test_target)
)
scores <- model %>% evaluate(
test, test_target, verbose = 1
)
preds <- model %>% predict(test)
predicted_classes <- model %>% predict_classes(test)
caret::confusionMatrix(as.factor(predicted_classes),as.factor(test_target_vector))

The preceding code prints model diagnostic data to our console. Your console will look like the following image:

The preceding code also produces a plot with model performance data. You will see a plot like the following image in your Viewer pane:

The preceding code also produces some performance metrics. The output printed to your console will look like the following image:

In this case, using this tactic caused our accuracy score to decrease slightly to 90.07%. This could be due to working with a dataset of such small images that the model suffered from the removal of data at every step. With larger datasets, this will likely help make models more efficient and generalize better.

The other tactic for preventing overfitting is using early stopping. Early stopping will monitor progress and stop the model from continuing to train on the data when the model no longer improves. In the following code, we have set the patience argument to 2, which means that the evaluation metric must fail to improve for two epochs in a row.

In the following code, with epochs kept at 5, the model will complete all five rounds; however, if you have time, feel free to run all the preceding code and then, when it is time to fit the model, increase the number of epochs to see when the early stopping function stops the model. We add early stopping functionality to our model using the following code:

model %>% fit(
train, train_target,
batch_size = 100,
epochs = 5,
verbose = 1,
validation_data = list(test, test_target),
callbacks = list(
callback_early_stopping(patience = 2)
)
)

From the preceding code, we can deduce the following: