Dual Pyramid Generative Adversarial Networks for Semantic Image
Synthesis
- URL: http://arxiv.org/abs/2210.04085v1
- Date: Sat, 8 Oct 2022 18:45:44 GMT
- Title: Dual Pyramid Generative Adversarial Networks for Semantic Image
Synthesis
- Authors: Shijie Li, Ming-Ming Cheng, Juergen Gall
- Abstract summary: The goal of semantic image synthesis is to generate photo-realistic images from semantic label maps.
Current state-of-the-art approaches, however, still struggle to generate realistic objects in images at various scales.
We propose a Dual Pyramid Generative Adversarial Network (DP-GAN) that learns the conditioning of spatially-adaptive normalization blocks at all scales jointly.
- Score: 94.76988562653845
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The goal of semantic image synthesis is to generate photo-realistic images
from semantic label maps. It is highly relevant for tasks like content
generation and image editing. Current state-of-the-art approaches, however,
still struggle to generate realistic objects in images at various scales. In
particular, small objects tend to fade away and large objects are often
generated as collages of patches. In order to address this issue, we propose a
Dual Pyramid Generative Adversarial Network (DP-GAN) that learns the
conditioning of spatially-adaptive normalization blocks at all scales jointly,
such that scale information is bi-directionally used, and it unifies
supervision at different scales. Our qualitative and quantitative results show
that the proposed approach generates images where small and large objects look
more realistic compared to images generated by state-of-the-art methods.
Related papers
- Unlocking Pre-trained Image Backbones for Semantic Image Synthesis [29.688029979801577]
We propose a new class of GAN discriminators for semantic image synthesis that generates highly realistic images.
Our model, which we dub DP-SIMS, achieves state-of-the-art results in terms of image quality and consistency with the input label maps on ADE-20K, COCO-Stuff, and Cityscapes.
arXiv Detail & Related papers (2023-12-20T09:39:19Z) - Learned representation-guided diffusion models for large-image generation [58.192263311786824]
We introduce a novel approach that trains diffusion models conditioned on embeddings from self-supervised learning (SSL)
Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images.
Augmenting real data by generating variations of real images improves downstream accuracy for patch-level and larger, image-scale classification tasks.
arXiv Detail & Related papers (2023-12-12T14:45:45Z) - Edge Guided GANs with Multi-Scale Contrastive Learning for Semantic
Image Synthesis [139.2216271759332]
We propose a novel ECGAN for the challenging semantic image synthesis task.
The semantic labels do not provide detailed structural information, making it challenging to synthesize local details and structures.
The widely adopted CNN operations such as convolution, down-sampling, and normalization usually cause spatial resolution loss.
We propose a novel contrastive learning method, which aims to enforce pixel embeddings belonging to the same semantic class to generate more similar image content.
arXiv Detail & Related papers (2023-07-22T14:17:19Z) - Object-Centric Relational Representations for Image Generation [18.069747511100132]
This paper explores a novel method to condition image generation, based on object-centric relational representations.
We show that such architectural biases entail properties that facilitate the manipulation and conditioning of the generative process.
We also propose a novel benchmark for image generation consisting of a synthetic dataset of images paired with their relational representation.
arXiv Detail & Related papers (2023-03-26T11:17:17Z) - ObjectStitch: Generative Object Compositing [43.206123360578665]
We propose a self-supervised framework for object compositing using conditional diffusion models.
Our framework can transform the viewpoint, geometry, color and shadow of the generated object while requiring no manual labeling.
Our method outperforms relevant baselines in both realism and faithfulness of the synthesized result images in a user study on various real-world images.
arXiv Detail & Related papers (2022-12-02T02:15:13Z) - Local and Global GANs with Semantic-Aware Upsampling for Image
Generation [201.39323496042527]
We consider generating images using local context.
We propose a class-specific generative network using semantic maps as guidance.
Lastly, we propose a novel semantic-aware upsampling method.
arXiv Detail & Related papers (2022-02-28T19:24:25Z) - Cross-View Image Synthesis with Deformable Convolution and Attention
Mechanism [29.528402825356398]
We propose to use Generative Adversarial Networks(GANs) based on a deformable convolution and attention mechanism to solve the problem of cross-view image synthesis.
It is difficult to understand and transform scenes appearance and semantic information from another view, thus we use deformed convolution in the U-net network to improve the network's ability to extract features of objects at different scales.
arXiv Detail & Related papers (2020-07-20T03:08:36Z) - Panoptic-based Image Synthesis [32.82903428124024]
Conditional image synthesis serves various applications for content editing to content generation.
We propose a panoptic aware image synthesis network to generate high fidelity and photorealistic images conditioned on panoptic maps.
arXiv Detail & Related papers (2020-04-21T20:40:53Z) - Edge Guided GANs with Contrastive Learning for Semantic Image Synthesis [194.1452124186117]
We propose a novel ECGAN for the challenging semantic image synthesis task.
Our ECGAN achieves significantly better results than state-of-the-art methods.
arXiv Detail & Related papers (2020-03-31T01:23:21Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.