The Stage-II StackGAN is slightly different from the Stage-I StackGAN. The inputs to the generator models are the conditioning variable () and the low-resolution images generated by the generator network in Stage-I.
It has five components:
- The text encoder
- The conditioning augmentation network
- Downsampling blocks
- Residual blocks
- Upsampling blocks
The text encoder and the CA network are similar to those used previously in the Stage-I section. We will now go through the three components of the generator network, which are downsampling blocks, residual blocks, and upsampling blocks.