The architecture of the discriminator network in pix2pix is inspired by the architecture of the PatchGAN network. The PatchGAN network contains eight convolutional blocks as follows:
Layer Name |
Hyperparameters |
Input Shape |
Output Shape |
1st 2D Convolution Layer |
filters=64, kernel_size=4, strides=2, padding='same', |
(256, 256, 1) |
(256, 256, 64) |
Activation Layer |
activation='leakyrelu', alpha=0.2 |
(128, 128, 64) |
(128, 128, 64) |
2nd 2D Convolution Layer |
filters=128, kernel_size=4, strides=2, padding='same', |
(128, 128, 64) |
(64, 64, 128) |
Batch Normalization Layer |
None |
(64, 64, 128) |
(64, 64, 128) |
Activation Layer |
activation='leakyrelu', alpha=0.2 |
(64, 64, 128) |
(64, 64, 128) |
3rd 2D Convolution Layer |
filters=256, kernel_size=4, strides=2, padding='same', |
(64, 64, 128) |
(32, 32, 256) |
Batch Normalization Layer |
None |
(32, 32, 256) |
(32, 32, 256) |
Activation Layer |
activation='leakyrelu', alpha=0.2 |
(32, 32, 256) |
(32, 32, 256) |
4th 2D Convolution Layer |
filters=512, kernel_size=4, strides=2, padding='same', |
(32, 32, 256) |
(16, 16, 512) |
Batch Normalization Layer |
None |
(16, 16, 512) |
(16, 16, 512) |
Activation Layer |
activation='leakyrelu', alpha=0.2 |
(16, 16, 512) |
(16, 16, 512) |
5th 2D Convolution Layer |
filters=512, kernel_size=4, strides=2, padding='same', |
(16, 16, 512) |
(8, 8, 512) |
Batch Normalization Layer |
None |
(8, 8, 512) |
(8, 8, 512) |
Activation Layer |
activation='leakyrelu', alpha=0.2 |
(8, 8, 512) |
(8, 8, 512) |
6th 2D Convolution Layer |
filters=512, kernel_size=4, strides=2, padding='same', |
(8, 8, 512) |
(4, 4, 512) |
Batch Normalization Layer |
None |
(4, 4, 512) |
(4, 4, 512) |
Activation Layer |
activation='leakyrelu', alpha=0.2 |
(4, 4, 512) |
(4, 4, 512) |
7th 2D Convolution Layer |
filters=512, kernel_size=4, strides=2, padding='same', |
(4, 4, 512) |
(2, 2, 512) |
Batch Normalization Layer |
None |
(2, 2, 512) |
(2, 2, 512) |
Activation Layer |
activation='leakyrelu', alpha=0.2 |
(2, 2, 512) |
(2, 2, 512) |
8th 2D Convolution Layer |
filters=512, kernel_size=4, strides=2, padding='same', |
(4, 4, 512) |
(1, 1, 512) |
Batch Normalization Layer |
None |
(1, 1, 512) |
(1, 1, 512) |
Activation Layer |
activation='leakyrelu', alpha=0.2 |
(1, 1, 512) |
(1, 1, 512) |
Flatten Layer |
None |
(1, 1, 512) |
(512, ) |
Dense Layer |
units=2, activation='softmax' |
(1, 1, 512) |
(2, ) |
This table highlights the architecture and the configuration of the discriminator network. A flatten layer flattens the tensor to a one-dimensional array.
We have now explored the architecture and configuration of both networks. We will now explore the training objective function that's required to train pix2pix.