In the previous chapter, we have learned how powerful and, at the same time, how easy the DL4J APIs are when it comes to configuring, building, and training multilayer neural network models. The possibilities to implement new models are almost innumerable relying on this framework only in Scala or Java.
But, let's have a look at the following search results from Google; they concern TensorFlow neural network models that are available on the web:
You can see that it is quite an impressive number in terms of results. And this is just a raw search. Refining the search to more specific model implementations means that the numbers are pretty high. But what's TensorFlow? TensorFlow (https://www.tensorflow.org/) is a powerful and comprehensive open source framework for ML and DL, developed by the Google Brain team. At present, it is the most popularly used framework by data scientists. So it has a big community, and lots of shared models and examples are available for it. This explains the big numbers. Among those models, the chances of finding a pre-trained model that fits your specific use case needs are high. So, where's the problem? TensorFlow is mostly Python.
It provides support for other programming languages, such as Java for the JVM, but its Java API is currently experimental and isn't covered by the TensorFlow API stability guarantees. Furthermore, the TensorFlow Python API presents a steep learning curve for non-Python developers and software engineers with no or a basic data science background. How then can they benefit from this framework? How can we reuse an existing valid model in a JVM-based environment? Keras (https://keras.io/) comes to the rescue. It is an open source, high-level neural network library written in Python that can be used to replace the TensorFlow high-level API (the following diagram shows the TensorFlow framework architecture):
Compared to TensorFlow, Keras is lightweight and allows easier prototyping. It can run not only on top of TensorFlow, but also on other backend Python engines. And last but not least, it can be used to import Python models into DL4J. The Keras Model Import DL4J library provides facilities for importing neural network models configured and trained through the Keras framework.
The following diagram shows that once a model has been imported into DL4J, the full production stack is at disposal for using it:
Let's now go into detail to understand how this happens. For the examples in this section, I am assuming you have already Python 2.7.x and the pip (https://pypi.org/project/pip/) package installer for Python on your machine. In order to implement a model in Keras, we have to install Keras itself and choose a backend (TensorFlow for the examples presented here). TensorFlow has to be installed first, as follows:
sudo pip install tensorflow
That's for the CPU only. If you need to run it on GPUs, you need to install the following:
sudo pip install tensorflow-gpu
We can now install Keras, as follows:
sudo pip install keras
Keras uses TensorFlow as its default tensor manipulation library, so no extra action has to be taken if TensorFlow is our backend of choice.
Let's start simple, implementing an MLP model using the Keras API. After the necessary imports, enter the following lines of code:
from keras.models import Sequential
from keras.layers import Dense
We create a Sequential model, as follows:
model = Sequential()
Then, we add layers through the add method of Sequential, as follows:
model.add(Dense(units=64, activation='relu', input_dim=100))
model.add(Dense(units=10, activation='softmax'))
The configuration of the learning process for this model can be done through the compile method, as follows:
model.compile(loss='categorical_crossentropy',
optimizer='sgd',
metrics=['accuracy'])
Finally, we serialize the model in HDF5 format, as follows:
model.save('basic_mlp.h5')
Hierarchical Data Format (HDF) is a set of file formats (with the extensions .hdf5 and .h5) to store and manage large amounts of data, in particular multidimensional numeric arrays. Keras uses it to save and load models.
After saving this simple program, basic_mlp.py, and running it, as follows the model will be serialized and saved in the basic_mlp.h5 file:
sudo python basic_mlp.py
Now, we are ready to import this model into DL4J. We need to add to the Scala project the usual DataVec API, DL4J core, and ND4J dependencies, plus the DL4J model import library, as follows:
groupId: org.deeplearning4j
artifactId: deeplearning4j-modelimport
version: 0.9.1
Copy the basic_mlp.h5 file in the resource folder of the project, then programmatically get its path, as follows:
val basicMlp = new ClassPathResource("basic_mlp.h5").getFile.getPath
Then, load the model as DL4J MultiLayerNetwork, using the importKerasSequentialModelAndWeights method of the KerasModelImport class (https://static.javadoc.io/org.deeplearning4j/deeplearning4j-modelimport/1.0.0-alpha/org/deeplearning4j/nn/modelimport/keras/KerasModelImport.html), as follows:
val model = KerasModelImport.importKerasSequentialModelAndWeights(basicMlp)
Generate some mock data, as follows:
val input = Nd4j.create(256, 100)
var output = model.output(input)
Now, we can train the model the usual way in DL4J, as follows:
model.fit(input, output)
All the considerations made in Chapter 7, Training Neural Networks with Spark, Chapter 8, Monitoring and Debugging Neural Network Training, and Chapter 9, Interpreting Neural Network Output, about training, monitoring, and evaluation with DL4J, apply here too.
It is possible, of course, to train the model in Keras (as in the following example):
model.fit(x_train, y_train, epochs=5, batch_size=32)
Here, x_train and y_train are NumPy (http://www.numpy.org/) arrays) and evaluate it before saving it in serialized form, as follows:
loss_and_metrics = model.evaluate(x_test, y_test, batch_size=128)
You can finally import the pre-trained model in the same way as explained previously, and just run it.
The same as for Sequential model imports, DL4J allows also the importing of Keras Functional models.
The latest versions of DL4J, also allow the importing of TensorFlow models. Imagine you want to import this (https://github.com/tensorflow/models/blob/master/official/mnist/mnist.py) pre-trained model (a CNN estimator for the MNIST database). At the end of the training, which happens in TensorFlow, you can save the model in a serialized form. TensorFlow's file format is based on Protocol Buffers (https://developers.google.com/protocol-buffers/?hl=en), which is a language and platform neutral extensible serialization mechanism for structured data.
Copy the serialized mnist.pb file into the resource folder of the DL4J Scala project and then programmatically get it and import the model, as follows:
val mnistTf = new ClassPathResource("mnist.pb").getFile
val sd = TFGraphMapper.getInstance.importGraph(mnistTf)
Finally, feed the model with images and start to do predictions, as follows:
for(i <- 1 to 10){
val file = "images/img_%d.jpg"
file = String.format(file, i)
val prediction = predict(file) //INDArray
val batchedArr = Nd4j.expandDims(arr, 0) //INDArray
sd.associateArrayWithVariable(batchedArr, sd.variables().get(0))
val out = sd.execAndEndResult //INDArray
Nd4j.squeeze(out, 0)
...
}