Generative Adversarial Networks 101

As shown in the following diagram, the Generative Adversarial Networks, popularly known as GANs, have two models working in sync to learn and train on complex data such as images, videos or audio files:

Intuitively, the generator model generates data starting from random noise but slowly learns how to generate more realistic data. The generator output and the real data is fed into the discriminator that learns how to differentiate fake data from real data.

Thus, both generator and discriminator play an adversarial game where the generator tries to fool the discriminator by generating as real data as possible, and the discriminator tries not to be fooled by identifying fake data from real data, thus the discriminator tries to minimize the classification loss. Both the models are trained in a lockstep fashion.

Mathematically, the generative model learns the probability distribution such that the discriminator is unable to identify between the probability distributions, and . The objective function of the GAN can be described by the following equation describing the value function V, (from https://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf):

The seminal tutorial at NIPS 2016 on GANs by Ian Goodfellow can be found at the following link: https://arxiv.org/pdf/1701.00160.pdf.

This description represents a simple GAN (also known as a vanilla GAN in literature), first introduced by Goodfellow in the seminal paper available at this link: https://arxiv.org/abs/1406.2661. Since then, there has been tremendous research in deriving different architectures based on GANs and applying them to different application areas.

For example, in conditional GANs the generator and the discriminator networks are provided with the labels such that the objective function of the conditional GAN can be described by the following equation describing the value function V:

The original paper describing the conditional GANs is located at the following link: https://arxiv.org/abs/1411.1784.

Several other derivatives and their originating papers used in applications, such as Text to Image, Image Synthesis, Image Tagging, Style Transfer, and Image Transfer and so on are listed in the following table:

GAN Derivative	Originating Paper	Demonstrated Application
StackGAN	https://arxiv.org/abs/1710.10916	Text to Image
StackGAN++	https://arxiv.org/abs/1612.03242	Photo-realistic Image Synthesis
DCGAN	https://arxiv.org/abs/1511.06434	Image Synthesis
HR-DCGAN	https://arxiv.org/abs/1711.06491	High-Resolution Image Synthesis
Conditonal GAN	https://arxiv.org/abs/1411.1784	Image Tagging
InfoGAN	https://arxiv.org/abs/1606.03657	Style Identification
Wasserstein GAN	https://arxiv.org/abs/1701.07875 https://arxiv.org/abs/1704.00028	Image Generation
Coupled GAN	https://arxiv.org/abs/1606.07536	Image Transformation, Domain Adaptation
BE GAN	https://arxiv.org/abs/1703.10717	Image Generation
DiscoGAN	https://arxiv.org/abs/1703.05192	Style Transfer
CycleGAN	https://arxiv.org/abs/1703.10593	Style Transfer

Let us practice creating a simple GAN using the MNIST dataset. For this exercise, we shall normalize the MNIST dataset to lie between [-1,+1], using the following function:

def norm(x):
    return (x-0.5)/0.5

We also define the random noise with 256 dimensions that would be used to test the generator models:

n_z = 256
z_test = np.random.uniform(-1.0,1.0,size=[8,n_z])

The function to display the generated images that would be used in all the examples in this chapter:

def display_images(images):
    for i in range(images.shape[0]):
        plt.subplot(1, 8, i + 1)
        plt.imshow(images[i])
        plt.axis('off')
    plt.tight_layout()
    plt.show()