The discriminator network

The architecture of the discriminator network in pix2pix is inspired by the architecture of the PatchGAN network. The PatchGAN network contains eight convolutional blocks as follows:

Layer Name	Hyperparameters	Input Shape	Output Shape
1st 2D Convolution Layer	filters=64, kernel_size=4, strides=2, padding='same',	(256, 256, 1)	(256, 256, 64)
Activation Layer	activation='leakyrelu', alpha=0.2	(128, 128, 64)	(128, 128, 64)
2nd 2D Convolution Layer	filters=128, kernel_size=4, strides=2, padding='same',	(128, 128, 64)	(64, 64, 128)
Batch Normalization Layer	None	(64, 64, 128)	(64, 64, 128)
Activation Layer	activation='leakyrelu', alpha=0.2	(64, 64, 128)	(64, 64, 128)
3rd 2D Convolution Layer	filters=256, kernel_size=4, strides=2, padding='same',	(64, 64, 128)	(32, 32, 256)
Batch Normalization Layer	None	(32, 32, 256)	(32, 32, 256)
Activation Layer	activation='leakyrelu', alpha=0.2	(32, 32, 256)	(32, 32, 256)
4th 2D Convolution Layer	filters=512, kernel_size=4, strides=2, padding='same',	(32, 32, 256)	(16, 16, 512)
Batch Normalization Layer	None	(16, 16, 512)	(16, 16, 512)
Activation Layer	activation='leakyrelu', alpha=0.2	(16, 16, 512)	(16, 16, 512)
5th 2D Convolution Layer	filters=512, kernel_size=4, strides=2, padding='same',	(16, 16, 512)	(8, 8, 512)
Batch Normalization Layer	None	(8, 8, 512)	(8, 8, 512)
Activation Layer	activation='leakyrelu', alpha=0.2	(8, 8, 512)	(8, 8, 512)
6th 2D Convolution Layer	filters=512, kernel_size=4, strides=2, padding='same',	(8, 8, 512)	(4, 4, 512)
Batch Normalization Layer	None	(4, 4, 512)	(4, 4, 512)
Activation Layer	activation='leakyrelu', alpha=0.2	(4, 4, 512)	(4, 4, 512)
7th 2D Convolution Layer	filters=512, kernel_size=4, strides=2, padding='same',	(4, 4, 512)	(2, 2, 512)
Batch Normalization Layer	None	(2, 2, 512)	(2, 2, 512)
Activation Layer	activation='leakyrelu', alpha=0.2	(2, 2, 512)	(2, 2, 512)
8th 2D Convolution Layer	filters=512, kernel_size=4, strides=2, padding='same',	(4, 4, 512)	(1, 1, 512)
Batch Normalization Layer	None	(1, 1, 512)	(1, 1, 512)
Activation Layer	activation='leakyrelu', alpha=0.2	(1, 1, 512)	(1, 1, 512)
Flatten Layer	None	(1, 1, 512)	(512, )
Dense Layer	units=2, activation='softmax'	(1, 1, 512)	(2, )

This table highlights the architecture and the configuration of the discriminator network. A flatten layer flattens the tensor to a one-dimensional array.

The remaining layers in the discriminator network are covered in the The Keras implementation of pix2pix section of this chapter.

We have now explored the architecture and configuration of both networks. We will now explore the training objective function that's required to train pix2pix.