Stage-II

The Stage-II StackGAN is slightly different from the Stage-I StackGAN. The inputs to the generator models are the conditioning variable () and the low-resolution images generated by the generator network in Stage-I.

It has five components:

The text encoder
The conditioning augmentation network
Downsampling blocks
Residual blocks
Upsampling blocks

The text encoder and the CA network are similar to those used previously in the Stage-I section. We will now go through the three components of the generator network, which are downsampling blocks, residual blocks, and upsampling blocks.