RNNs with DL4J

The first example presented in this chapter is an LSTM which, after the training, will recite the following characters once the first character of the learning string has been used as input for it.

The dependencies for this example are the following:

Scala 2.11.8
DL4J NN 0.9.1
ND4J Native 0.9.1 and the specific classifier for the OS of the machine where you would run it
ND4J jblas 0.4-rc3.6

Assuming we have a learn string that is specified through an immutable variable LEARNSTRING, let's start creating a dedicated list of possible characters from it, as follows:

val LEARNSTRING_CHARS: util.LinkedHashSet[Character] = new util.LinkedHashSet[Character]
for (c <- LEARNSTRING) {
        LEARNSTRING_CHARS.add(c)
}
LEARNSTRING_CHARS_LIST.addAll(LEARNSTRING_CHARS)

Let's configure the network, as follows:

val builder: NeuralNetConfiguration.Builder = new NeuralNetConfiguration.Builder
builder.iterations(10)
builder.learningRate(0.001)
builder.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
builder.seed(123)
builder.biasInit(0)
builder.miniBatch(false)
builder.updater(Updater.RMSPROP)
builder.weightInit(WeightInit.XAVIER)

You will notice that we are using the same NeuralNetConfiguration.Builder class as for the CNN example presented in the previous chapter. This same abstraction is used for any network you need to implement through DL4J. The optimization algorithm used is the Stochastic Gradient Descent (https://en.wikipedia.org/wiki/Stochastic_gradient_descent). The meaning of the other parameters will be explained in the next chapter that will focus on training.

Let's now define the layers for this network. The model we are implementing is based on the LSTM RNN by Alex Graves (https://en.wikipedia.org/wiki/Alex_Graves_(computer_scientist)). After deciding their total number assigning a value to an immutable variable HIDDEN_LAYER_CONT, we can define the hidden layers of our network, as follows:

val listBuilder = builder.list
for (i <- 0 until HIDDEN_LAYER_CONT) {
  val hiddenLayerBuilder: GravesLSTM.Builder = new GravesLSTM.Builder
  hiddenLayerBuilder.nIn(if (i == 0) LEARNSTRING_CHARS.size else HIDDEN_LAYER_WIDTH)
  hiddenLayerBuilder.nOut(HIDDEN_LAYER_WIDTH)
  hiddenLayerBuilder.activation(Activation.TANH)
  listBuilder.layer(i, hiddenLayerBuilder.build)
}

The activation function is tanh (hyperbolic tangent).

We need then to define the outputLayer (choosing softmax as the activation function), as follows:

val outputLayerBuilder: RnnOutputLayer.Builder = new RnnOutputLayer.Builder(LossFunction.MCXENT)
outputLayerBuilder.activation(Activation.SOFTMAX)
outputLayerBuilder.nIn(HIDDEN_LAYER_WIDTH)
outputLayerBuilder.nOut(LEARNSTRING_CHARS.size)
listBuilder.layer(HIDDEN_LAYER_CONT, outputLayerBuilder.build)

Before completing the configuration, we must specify that this model isn't pre-trained and that we use backpropagation, as follows:

listBuilder.pretrain(false)
listBuilder.backprop(true)

The network (MultiLayerNetwork) can be created starting from the preceding configuration, as follows:

val conf = listBuilder.build
val net = new MultiLayerNetwork(conf)
net.init()
net.setListeners(new ScoreIterationListener(1))

Some training data can be generated programmatically starting from the learning string character list, as follows:

val input = Nd4j.zeros(1, LEARNSTRING_CHARS_LIST.size, LEARNSTRING.length)
val labels = Nd4j.zeros(1, LEARNSTRING_CHARS_LIST.size, LEARNSTRING.length)
var samplePos = 0
for (currentChar <- LEARNSTRING) {
  val nextChar = LEARNSTRING((samplePos + 1) % (LEARNSTRING.length))
  input.putScalar(Array[Int](0, LEARNSTRING_CHARS_LIST.indexOf(currentChar), samplePos), 1)
  labels.putScalar(Array[Int](0, LEARNSTRING_CHARS_LIST.indexOf(nextChar), samplePos), 1)
  samplePos += 1
}
val trainingData: DataSet = new DataSet(input, labels)

The way the training for this RNN happens will be covered in the next chapter (and the code example will be completed there)—the focus in this section is to show how to configure and build an RNN network using the DL4J API.