Edge Guided GANs with Contrastive Learning for Semantic Image Synthesis
- URL: http://arxiv.org/abs/2003.13898v3
- Date: Tue, 28 Mar 2023 00:15:58 GMT
- Title: Edge Guided GANs with Contrastive Learning for Semantic Image Synthesis
- Authors: Hao Tang, Xiaojuan Qi, Guolei Sun, Dan Xu, Nicu Sebe, Radu Timofte,
Luc Van Gool
- Abstract summary: We propose a novel ECGAN for the challenging semantic image synthesis task.
Our ECGAN achieves significantly better results than state-of-the-art methods.
- Score: 194.1452124186117
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel ECGAN for the challenging semantic image synthesis task.
Although considerable improvement has been achieved, the quality of synthesized
images is far from satisfactory due to three largely unresolved challenges. 1)
The semantic labels do not provide detailed structural information, making it
difficult to synthesize local details and structures. 2) The widely adopted CNN
operations such as convolution, down-sampling, and normalization usually cause
spatial resolution loss and thus cannot fully preserve the original semantic
information, leading to semantically inconsistent results. 3) Existing semantic
image synthesis methods focus on modeling local semantic information from a
single input semantic layout. However, they ignore global semantic information
of multiple input semantic layouts, i.e., semantic cross-relations between
pixels across different input layouts. To tackle 1), we propose to use edge as
an intermediate representation which is further adopted to guide image
generation via a proposed attention guided edge transfer module. Edge
information is produced by a convolutional generator and introduces detailed
structure information. To tackle 2), we design an effective module to
selectively highlight class-dependent feature maps according to the original
semantic layout to preserve the semantic information. To tackle 3), inspired by
current methods in contrastive learning, we propose a novel contrastive
learning method, which aims to enforce pixel embeddings belonging to the same
semantic class to generate more similar image content than those from different
classes. Doing so can capture more semantic relations by explicitly exploring
the structures of labeled pixels from multiple input semantic layouts.
Experiments on three challenging datasets show that our ECGAN achieves
significantly better results than state-of-the-art methods.
Related papers
- GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding [101.32590239809113]
Generalized Perception NeRF (GP-NeRF) is a novel pipeline that makes the widely used segmentation model and NeRF work compatibly under a unified framework.
We propose two self-distillation mechanisms, i.e., the Semantic Distill Loss and the Depth-Guided Semantic Distill Loss, to enhance the discrimination and quality of the semantic field.
arXiv Detail & Related papers (2023-11-20T15:59:41Z) - SCONE-GAN: Semantic Contrastive learning-based Generative Adversarial
Network for an end-to-end image translation [18.93434486338439]
SCONE-GAN is shown to be effective for learning to generate realistic and diverse scenery images.
For more realistic and diverse image generation we introduce style reference image.
We validate the proposed algorithm for image-to-image translation and stylizing outdoor images.
arXiv Detail & Related papers (2023-11-07T10:29:16Z) - Edge Guided GANs with Multi-Scale Contrastive Learning for Semantic
Image Synthesis [139.2216271759332]
We propose a novel ECGAN for the challenging semantic image synthesis task.
The semantic labels do not provide detailed structural information, making it challenging to synthesize local details and structures.
The widely adopted CNN operations such as convolution, down-sampling, and normalization usually cause spatial resolution loss.
We propose a novel contrastive learning method, which aims to enforce pixel embeddings belonging to the same semantic class to generate more similar image content.
arXiv Detail & Related papers (2023-07-22T14:17:19Z) - Wavelet-based Unsupervised Label-to-Image Translation [9.339522647331334]
We propose a new Unsupervised paradigm for SIS (USIS) that makes use of a self-supervised segmentation loss and whole image wavelet based discrimination.
We test our methodology on 3 challenging datasets and demonstrate its ability to bridge the performance gap between paired and unpaired models.
arXiv Detail & Related papers (2023-05-16T17:48:44Z) - Few-shot Semantic Image Synthesis with Class Affinity Transfer [23.471210664024067]
We propose a transfer method that leverages a model trained on a large source dataset to improve the learning ability on small target datasets.
The class affinity matrix is introduced as a first layer to the source model to make it compatible with the target label maps.
We apply our approach to GAN-based and diffusion-based architectures for semantic synthesis.
arXiv Detail & Related papers (2023-04-05T09:24:45Z) - Dual Pyramid Generative Adversarial Networks for Semantic Image
Synthesis [94.76988562653845]
The goal of semantic image synthesis is to generate photo-realistic images from semantic label maps.
Current state-of-the-art approaches, however, still struggle to generate realistic objects in images at various scales.
We propose a Dual Pyramid Generative Adversarial Network (DP-GAN) that learns the conditioning of spatially-adaptive normalization blocks at all scales jointly.
arXiv Detail & Related papers (2022-10-08T18:45:44Z) - CRIS: CLIP-Driven Referring Image Segmentation [71.56466057776086]
We propose an end-to-end CLIP-Driven Referring Image framework (CRIS)
CRIS resorts to vision-language decoding and contrastive learning for achieving the text-to-pixel alignment.
Our proposed framework significantly outperforms the state-of-the-art performance without any post-processing.
arXiv Detail & Related papers (2021-11-30T07:29:08Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.