Diversifying Semantic Image Synthesis and Editing via Class- and
Layer-wise VAEs
- URL: http://arxiv.org/abs/2106.13416v2
- Date: Tue, 29 Jun 2021 06:56:09 GMT
- Title: Diversifying Semantic Image Synthesis and Editing via Class- and
Layer-wise VAEs
- Authors: Yuki Endo, Yoshihiro Kanamori
- Abstract summary: We propose a class- and layer-wise extension to the variational autoencoder framework that allows flexible control over each object class at the local to global levels.
We demonstrate that our method generates images that are both plausible and more diverse compared to state-of-the-art methods.
- Score: 8.528384027684192
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic image synthesis is a process for generating photorealistic images
from a single semantic mask. To enrich the diversity of multimodal image
synthesis, previous methods have controlled the global appearance of an output
image by learning a single latent space. However, a single latent code is often
insufficient for capturing various object styles because object appearance
depends on multiple factors. To handle individual factors that determine object
styles, we propose a class- and layer-wise extension to the variational
autoencoder (VAE) framework that allows flexible control over each object class
at the local to global levels by learning multiple latent spaces. Furthermore,
we demonstrate that our method generates images that are both plausible and
more diverse compared to state-of-the-art methods via extensive experiments
with real and synthetic datasets inthree different domains. We also show that
our method enables a wide range of applications in image synthesis and editing
tasks.
Related papers
- Generative Powers of Ten [60.6740997942711]
We present a method that uses a text-to-image model to generate consistent content across multiple image scales.
We achieve this through a joint multi-scale diffusion sampling approach.
Our method enables deeper levels of zoom than traditional super-resolution methods.
arXiv Detail & Related papers (2023-12-04T18:59:25Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - Painting 3D Nature in 2D: View Synthesis of Natural Scenes from a Single
Semantic Mask [29.38152100352871]
We introduce a novel approach that takes a single semantic mask as input to synthesize multi-view consistent color images of natural scenes.
Our method outperforms baseline methods and produces photorealistic, multi-view consistent videos of a variety of natural scenes.
arXiv Detail & Related papers (2023-02-14T17:57:58Z) - Variation-Aware Semantic Image Synthesis [5.232306238197685]
We introduce two simple methods to achieve variation-aware semantic image synthesis (VASIS) with a higher intra-class variation, semantic noise and position code.
Our models generate more natural images and achieves slightly better FIDs and/or mIoUs than the counterparts.
arXiv Detail & Related papers (2023-01-25T12:35:17Z) - Learning to Model Multimodal Semantic Alignment for Story Visualization [58.16484259508973]
Story visualization aims to generate a sequence of images to narrate each sentence in a multi-sentence story.
Current works face the problem of semantic misalignment because of their fixed architecture and diversity of input modalities.
We explore the semantic alignment between text and image representations by learning to match their semantic levels in the GAN-based generative model.
arXiv Detail & Related papers (2022-11-14T11:41:44Z) - Dual Pyramid Generative Adversarial Networks for Semantic Image
Synthesis [94.76988562653845]
The goal of semantic image synthesis is to generate photo-realistic images from semantic label maps.
Current state-of-the-art approaches, however, still struggle to generate realistic objects in images at various scales.
We propose a Dual Pyramid Generative Adversarial Network (DP-GAN) that learns the conditioning of spatially-adaptive normalization blocks at all scales jointly.
arXiv Detail & Related papers (2022-10-08T18:45:44Z) - Multimodal Face Synthesis from Visual Attributes [85.87796260802223]
We propose a novel generative adversarial network that simultaneously synthesizes identity preserving multimodal face images.
multimodal stretch-in modules are introduced in the discriminator which discriminates between real and fake images.
arXiv Detail & Related papers (2021-04-09T13:47:23Z) - Diverse Semantic Image Synthesis via Probability Distribution Modeling [103.88931623488088]
We propose a novel diverse semantic image synthesis framework.
Our method can achieve superior diversity and comparable quality compared to state-of-the-art methods.
arXiv Detail & Related papers (2021-03-11T18:59:25Z) - Generating Annotated High-Fidelity Images Containing Multiple Coherent
Objects [10.783993190686132]
We propose a multi-object generation framework that can synthesize images with multiple objects without explicitly requiring contextual information.
We demonstrate how coherency and fidelity are preserved with our method through experiments on the Multi-MNIST and CLEVR datasets.
arXiv Detail & Related papers (2020-06-22T11:33:55Z) - Panoptic-based Image Synthesis [32.82903428124024]
Conditional image synthesis serves various applications for content editing to content generation.
We propose a panoptic aware image synthesis network to generate high fidelity and photorealistic images conditioned on panoptic maps.
arXiv Detail & Related papers (2020-04-21T20:40:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.