Generative Adversarial Networks (GANs) are deep neural network architectures that include two nets that are pitted against each other (that's the reason for the adversarial adjective in the name). GAN algorithms are used in unsupervised machine learning. The main focus for GANs is to generate data from scratch. Among the most popular use cases of GANs, there's image generation from text, image-to-image-translation, increasing image resolution to make more realistic pictures, and doing predictions on the next frames of videos.
As we mentioned previously, a GAN is made up of two deep networks, the generator and the discriminator; the first one generates candidates, while the second one evaluates them. Let's see how generative and discriminative algorithms work at a very high level. Discriminative algorithms try to classify the input data. Therefore, they predict a label or category to which that input data belongs. Their only concern is to map features to labels. Generative algorithms, instead of predicting a label when given certain features, attempt to predict features when given a certain label. Essentially, they do the opposite thing from what the discriminative algorithms do.
Here's how a GAN works. The generator generates new data instances, while the discriminator evaluates them to assess their authenticity. Using the same MNIST dataset (http://yann.lecun.com/exdb/mnist/) that has been considered to illustrate more than one code example throughout this book, let's think of a scenario to make it clear what happens in GANs. Suppose we have the generator generating an MNIST dataset like hand-written numerals and then we're passing them to the discriminator. The goal of the generator is to generate passable hand-written digits without being caught, while the goal of the discriminator is to identify those images coming from the generator as fake hand-written digits. With reference to the following diagram, these are the steps that this GAN takes:
- The generator net takes some random numbers as input and then returns an image.
- The generated image is used to feed the discriminator net alongside a stream of other images that have been taken from the training dataset.
- While taking in both real and fake images, the discriminator returns probabilities, which are numbers between zero and one. Zero represents a prediction of fake, while one represents a prediction of authenticity:
In terms of implementation, the Discriminator Net is a standard CNN that can categorize the images fed to it, while the Generator Net is an inverse CNN. Both nets try to optimize a different and opposing loss function in a zero-sum game. This model is essentially an actor-critic model (https://cs.wmich.edu/~trenary/files/cs5300/RLBook/node66.html), whereas the Discriminator Net changes its behavior, so does the generator net, and vice versa.
At the time of writing this book, DL4J doesn't provide any direct API for GANs, but it allows you to import existing Keras (like those you can find it at https://github.com/eriklindernoren/Keras-GAN, which is our GitHub repository) or TensorFlow (like this one: https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/gan.py) GAN models and then retrain them and/or make predictions using the DL4J API in a JVM environment (which can include Spark), as explained in Chapter 10, Deploying on a Distributed System, and Chapter 14, Image Classification. No direct capabilities for GANs are in the immediate plan for DL4J, but the Python model's import is a valid way to train and make inference with them.