Gated Recurrent Units model

Gated recurrent units (GRUs) are similar to LSTM cells but simpler. They have one gate that combines the forget and input gates in LSTM, and there is no output gate. While GRUs are simpler than LSTMs and therefore quicker to train, it is a matter of debate on whether they are better than LSTMs, as the research is inconclusive. Therefore, it is recommended to try both, as the results of your task may vary. The code for our GRU model is in Chapter7/classify_keras5.R. The parameters for the model are max length=150, the size of the embedding layer=32, and the model was trained for 10 epochs:

word_index <- dataset_reuters_word_index()
max_features <- length(word_index)
maxlen <- 250
skip_top = 0

...........

model <- keras_model_sequential() %>%
  layer_embedding(input_dim = max_features, output_dim = 32,input_length = maxlen) %>%
  layer_dropout(rate = 0.25) %>%
  layer_gru(128,dropout=0.2) %>%
  layer_dense(units = 1, activation = "sigmoid")

...........
  
history <- model %>% fit(
  x_train, y_train,
  epochs = 10,
  batch_size = 32,
  validation_split = 0.2
)

Here is the output from the model's training:

Train on 8982 samples, validate on 2246 samples
Epoch 1/10
8982/8982 [==============================] - 35s 4ms/step - loss: 0.3231 - acc: 0.8867 - val_loss: 0.2068 - val_acc: 0.9372
Epoch 2/10
8982/8982 [==============================] - 35s 4ms/step - loss: 0.2084 - acc: 0.9381 - val_loss: 0.2065 - val_acc: 0.9421
Epoch 3/10
8982/8982 [==============================] - 35s 4ms/step - loss: 0.1824 - acc: 0.9454 - val_loss: 0.1711 - val_acc: 0.9501
Epoch 4/10
8982/8982 [==============================] - 35s 4ms/step - loss: 0.1656 - acc: 0.9515 - val_loss: 0.1719 - val_acc: 0.9550
Epoch 5/10
8982/8982 [==============================] - 35s 4ms/step - loss: 0.1569 - acc: 0.9551 - val_loss: 0.1668 - val_acc: 0.9541
Epoch 6/10
8982/8982 [==============================] - 35s 4ms/step - loss: 0.1477 - acc: 0.9570 - val_loss: 0.1667 - val_acc: 0.9555
Epoch 7/10
8982/8982 [==============================] - 35s 4ms/step - loss: 0.1441 - acc: 0.9605 - val_loss: 0.1612 - val_acc: 0.9581
Epoch 8/10
8982/8982 [==============================] - 36s 4ms/step - loss: 0.1361 - acc: 0.9611 - val_loss: 0.1593 - val_acc: 0.9590
Epoch 9/10
8982/8982 [==============================] - 35s 4ms/step - loss: 0.1361 - acc: 0.9620 - val_loss: 0.1646 - val_acc: 0.9568
Epoch 10/10
8982/8982 [==============================] - 35s 4ms/step - loss: 0.1306 - acc: 0.9634 - val_loss: 0.1660 - val_acc: 0.9559

The best validation accuracy was after epoch 5, when we got 95.90% accuracy, which is an improvement on the 95.37% we got with LSTM. In fact, this is the best result we have seen so far. In the next section, we will look at bidirectional architectures.