DivCo: Diverse Conditional Image Synthesis via Contrastive Generative
Adversarial Network
- URL: http://arxiv.org/abs/2103.07893v1
- Date: Sun, 14 Mar 2021 11:11:15 GMT
- Title: DivCo: Diverse Conditional Image Synthesis via Contrastive Generative
Adversarial Network
- Authors: Rui Liu, Yixiao Ge, Ching Lam Choi, Xiaogang Wang, Hongsheng Li
- Abstract summary: Conditional generative adversarial networks (cGANs) target at diverse images given the input conditions and latent codes.
Recent MSGAN tried to encourage the diversity of the generated image but only considers "negative" relations between the image pairs.
We propose a novel DivCo framework to properly constrain both "positive" and "negative" relations between the generated images specified in the latent space.
- Score: 70.12848483302915
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conditional generative adversarial networks (cGANs) target at synthesizing
diverse images given the input conditions and latent codes, but unfortunately,
they usually suffer from the issue of mode collapse. To solve this issue,
previous works mainly focused on encouraging the correlation between the latent
codes and their generated images, while ignoring the relations between images
generated from various latent codes. The recent MSGAN tried to encourage the
diversity of the generated image but only considers "negative" relations
between the image pairs. In this paper, we propose a novel DivCo framework to
properly constrain both "positive" and "negative" relations between the
generated images specified in the latent space. To the best of our knowledge,
this is the first attempt to use contrastive learning for diverse conditional
image synthesis. A novel latent-augmented contrastive loss is introduced, which
encourages images generated from adjacent latent codes to be similar and those
generated from distinct latent codes to be dissimilar. The proposed
latent-augmented contrastive loss is well compatible with various cGAN
architectures. Extensive experiments demonstrate that the proposed DivCo can
produce more diverse images than state-of-the-art methods without sacrificing
visual quality in multiple unpaired and paired image generation tasks.
Related papers
- Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis [65.7968515029306]
We propose a novel Coarse-to-Fine Latent Diffusion (CFLD) method for Pose-Guided Person Image Synthesis (PGPIS)
A perception-refined decoder is designed to progressively refine a set of learnable queries and extract semantic understanding of person images as a coarse-grained prompt.
arXiv Detail & Related papers (2024-02-28T06:07:07Z) - Improving Diffusion-Based Image Synthesis with Context Prediction [49.186366441954846]
Existing diffusion models mainly try to reconstruct input image from a corrupted one with a pixel-wise or feature-wise constraint along spatial axes.
We propose ConPreDiff to improve diffusion-based image synthesis with context prediction.
Our ConPreDiff consistently outperforms previous methods and achieves a new SOTA text-to-image generation results on MS-COCO, with a zero-shot FID score of 6.21.
arXiv Detail & Related papers (2024-01-04T01:10:56Z) - Unlocking Pre-trained Image Backbones for Semantic Image Synthesis [29.688029979801577]
We propose a new class of GAN discriminators for semantic image synthesis that generates highly realistic images.
Our model, which we dub DP-SIMS, achieves state-of-the-art results in terms of image quality and consistency with the input label maps on ADE-20K, COCO-Stuff, and Cityscapes.
arXiv Detail & Related papers (2023-12-20T09:39:19Z) - High-Fidelity Image Inpainting with GAN Inversion [23.49170140410603]
In this paper, we propose a novel GAN inversion model for image inpainting, dubbed InvertFill.
Within the encoder, the pre-modulation network leverages multi-scale structures to encode more discriminative semantics into style vectors.
To reconstruct faithful and photorealistic images, a simple yet effective Soft-update Mean Latent module is designed to capture more diverse in-domain patterns that synthesize high-fidelity textures for large corruptions.
arXiv Detail & Related papers (2022-08-25T03:39:24Z) - Spatially Multi-conditional Image Generation [80.04130168156792]
We propose a novel neural architecture to address the problem of multi-conditional image generation.
The proposed method uses a transformer-like architecture operating pixel-wise, which receives the available labels as input tokens.
Our experiments on three benchmark datasets demonstrate the clear superiority of our method over the state-of-the-art and the compared baselines.
arXiv Detail & Related papers (2022-03-25T17:57:13Z) - Modulated Contrast for Versatile Image Synthesis [60.304183493234376]
MoNCE is a versatile metric that introduces image contrast to learn a calibrated metric for the perception of multifaceted inter-image distances.
We introduce optimal transport in MoNCE to modulate the pushing force of negative samples collaboratively across multiple contrastive objectives.
arXiv Detail & Related papers (2022-03-17T14:03:46Z) - Multimodal Image-to-Image Translation via Mutual Information Estimation
and Maximization [16.54980086211836]
Multimodal image-to-image translation (I2IT) aims to learn a conditional distribution that explores multiple possible images in the target domain given an input image in the source domain.
Conditional generative adversarial networks (cGANs) are often adopted for modeling such a conditional distribution.
We propose a method that explicitly estimates and maximizes the mutual information between the latent code and the output image in cGANs.
arXiv Detail & Related papers (2020-08-08T14:09:23Z) - ContraGAN: Contrastive Learning for Conditional Image Generation [14.077997868828177]
Conditional Generative Adversarial Networks (GAN) are used to generate diverse images using class label information.
We propose ContraGAN that considers relations between multiple image embeddings in the same batch (data-to-data relations) as well as the data-to-class relations by using a conditional contrastive loss.
The experimental results show that ContraGAN outperforms state-of-the-art-models by 7.3% and 7.7% on Tiny ImageNet and ImageNet datasets, respectively.
arXiv Detail & Related papers (2020-06-23T00:49:05Z) - Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields.
To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss.
We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.