Although significant progress has been made in synthesizing visually realistic face images by Generative Adversarial Networks (GANs), there still lacks effective approaches to provide fine-grained control over the generation process for semantic facial attribute editing. In this work, we propose a novel cross channel self-attention based generative adversarial network (CCA-GAN), which weights the importance of multiple channels of features and archives pixel-level feature alignment and conversion, to reduce the impact on irrelevant attributes while editing the target attributes. Evaluation results show that CCA-GAN outperforms state-of-the-art models on the CelebA dataset, reducing Fréchet Inception Distance (FID) and Kernel Inception Distance (KID) by 15∼28% and 25∼100%, respectively. Furthermore, visualization of generated samples confirms the effect of disentanglement of the proposed model.
This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant 61806142, and in part by the Tianjin Science and Technology Program under Grants 18JCYBJC44000 and 19PTZWHZ00020.