Edge Guided GANs with Multi-Scale Contrastive Learning for Semantic
Image Synthesis
- URL: http://arxiv.org/abs/2307.12084v1
- Date: Sat, 22 Jul 2023 14:17:19 GMT
- Title: Edge Guided GANs with Multi-Scale Contrastive Learning for Semantic
Image Synthesis
- Authors: Hao Tang, Guolei Sun, Nicu Sebe, Luc Van Gool
- Abstract summary: We propose a novel ECGAN for the challenging semantic image synthesis task.
The semantic labels do not provide detailed structural information, making it challenging to synthesize local details and structures.
The widely adopted CNN operations such as convolution, down-sampling, and normalization usually cause spatial resolution loss.
We propose a novel contrastive learning method, which aims to enforce pixel embeddings belonging to the same semantic class to generate more similar image content.
- Score: 139.2216271759332
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel ECGAN for the challenging semantic image synthesis task.
Although considerable improvements have been achieved by the community in the
recent period, the quality of synthesized images is far from satisfactory due
to three largely unresolved challenges. 1) The semantic labels do not provide
detailed structural information, making it challenging to synthesize local
details and structures; 2) The widely adopted CNN operations such as
convolution, down-sampling, and normalization usually cause spatial resolution
loss and thus cannot fully preserve the original semantic information, leading
to semantically inconsistent results (e.g., missing small objects); 3) Existing
semantic image synthesis methods focus on modeling 'local' semantic information
from a single input semantic layout. However, they ignore 'global' semantic
information of multiple input semantic layouts, i.e., semantic cross-relations
between pixels across different input layouts. To tackle 1), we propose to use
the edge as an intermediate representation which is further adopted to guide
image generation via a proposed attention guided edge transfer module. To
tackle 2), we design an effective module to selectively highlight
class-dependent feature maps according to the original semantic layout to
preserve the semantic information. To tackle 3), inspired by current methods in
contrastive learning, we propose a novel contrastive learning method, which
aims to enforce pixel embeddings belonging to the same semantic class to
generate more similar image content than those from different classes. We
further propose a novel multi-scale contrastive learning method that aims to
push same-class features from different scales closer together being able to
capture more semantic relations by explicitly exploring the structures of
labeled pixels from multiple input semantic layouts from different scales.
Related papers
- Label-free Neural Semantic Image Synthesis [12.194020204848492]
We introduce the concept of neural semantic image synthesis, which uses neural layouts extracted from pre-trained foundation models as conditioning.
We experimentally show that images synthesized via neural semantic image synthesis achieve similar or superior pixel-level alignment of semantic classes.
We show that images generated by neural layout conditioning can effectively augment real data for training various perception tasks.
arXiv Detail & Related papers (2024-07-01T20:30:23Z) - SCONE-GAN: Semantic Contrastive learning-based Generative Adversarial
Network for an end-to-end image translation [18.93434486338439]
SCONE-GAN is shown to be effective for learning to generate realistic and diverse scenery images.
For more realistic and diverse image generation we introduce style reference image.
We validate the proposed algorithm for image-to-image translation and stylizing outdoor images.
arXiv Detail & Related papers (2023-11-07T10:29:16Z) - Few-shot Semantic Image Synthesis with Class Affinity Transfer [23.471210664024067]
We propose a transfer method that leverages a model trained on a large source dataset to improve the learning ability on small target datasets.
The class affinity matrix is introduced as a first layer to the source model to make it compatible with the target label maps.
We apply our approach to GAN-based and diffusion-based architectures for semantic synthesis.
arXiv Detail & Related papers (2023-04-05T09:24:45Z) - Learning to Model Multimodal Semantic Alignment for Story Visualization [58.16484259508973]
Story visualization aims to generate a sequence of images to narrate each sentence in a multi-sentence story.
Current works face the problem of semantic misalignment because of their fixed architecture and diversity of input modalities.
We explore the semantic alignment between text and image representations by learning to match their semantic levels in the GAN-based generative model.
arXiv Detail & Related papers (2022-11-14T11:41:44Z) - Dual Pyramid Generative Adversarial Networks for Semantic Image
Synthesis [94.76988562653845]
The goal of semantic image synthesis is to generate photo-realistic images from semantic label maps.
Current state-of-the-art approaches, however, still struggle to generate realistic objects in images at various scales.
We propose a Dual Pyramid Generative Adversarial Network (DP-GAN) that learns the conditioning of spatially-adaptive normalization blocks at all scales jointly.
arXiv Detail & Related papers (2022-10-08T18:45:44Z) - Semantic Disentangling Generalized Zero-Shot Learning [50.259058462272435]
Generalized Zero-Shot Learning (GZSL) aims to recognize images from both seen and unseen categories.
In this paper, we propose a novel feature disentangling approach based on an encoder-decoder architecture.
The proposed model aims to distill quality semantic-consistent representations that capture intrinsic features of seen images.
arXiv Detail & Related papers (2021-01-20T05:46:21Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z) - Edge Guided GANs with Contrastive Learning for Semantic Image Synthesis [194.1452124186117]
We propose a novel ECGAN for the challenging semantic image synthesis task.
Our ECGAN achieves significantly better results than state-of-the-art methods.
arXiv Detail & Related papers (2020-03-31T01:23:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.