Semantic Image Synthesis via Diffusion Models
- URL: http://arxiv.org/abs/2207.00050v1
- Date: Thu, 30 Jun 2022 18:31:51 GMT
- Title: Semantic Image Synthesis via Diffusion Models
- Authors: Weilun Wang, Jianmin Bao, Wengang Zhou, Dongdong Chen, Dong Chen, Lu
Yuan and Houqiang Li
- Abstract summary: Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
- Score: 159.4285444680301
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable
success in various image generation tasks compared with Generative Adversarial
Nets (GANs). Recent work on semantic image synthesis mainly follows the
\emph{de facto} GAN-based approaches, which may lead to unsatisfactory quality
or diversity of generated images. In this paper, we propose a novel framework
based on DDPM for semantic image synthesis. Unlike previous conditional
diffusion model directly feeds the semantic layout and noisy image as input to
a U-Net structure, which may not fully leverage the information in the input
semantic mask, our framework processes semantic layout and noisy image
differently. It feeds noisy image to the encoder of the U-Net structure while
the semantic layout to the decoder by multi-layer spatially-adaptive
normalization operators. To further improve the generation quality and semantic
interpretability in semantic image synthesis, we introduce the classifier-free
guidance sampling strategy, which acknowledge the scores of an unconditional
model for sampling process. Extensive experiments on three benchmark datasets
demonstrate the effectiveness of our proposed method, achieving
state-of-the-art performance in terms of fidelity~(FID) and diversity~(LPIPS).
Related papers
- Unlocking Pre-trained Image Backbones for Semantic Image Synthesis [29.688029979801577]
We propose a new class of GAN discriminators for semantic image synthesis that generates highly realistic images.
Our model, which we dub DP-SIMS, achieves state-of-the-art results in terms of image quality and consistency with the input label maps on ADE-20K, COCO-Stuff, and Cityscapes.
arXiv Detail & Related papers (2023-12-20T09:39:19Z) - Diversified in-domain synthesis with efficient fine-tuning for few-shot
classification [64.86872227580866]
Few-shot image classification aims to learn an image classifier using only a small set of labeled examples per class.
We propose DISEF, a novel approach which addresses the generalization challenge in few-shot learning using synthetic data.
We validate our method in ten different benchmarks, consistently outperforming baselines and establishing a new state-of-the-art for few-shot classification.
arXiv Detail & Related papers (2023-12-05T17:18:09Z) - Improving Few-shot Image Generation by Structural Discrimination and
Textural Modulation [10.389698647141296]
Few-shot image generation aims to produce plausible and diverse images for one category given a few images from this category.
Existing approaches either globally interpolate different images or fuse local representations with pre-defined coefficients.
This paper proposes a novel mechanism to inject external semantic signals into internal local representations.
arXiv Detail & Related papers (2023-08-30T16:10:21Z) - UniDiff: Advancing Vision-Language Models with Generative and
Discriminative Learning [86.91893533388628]
This paper presents UniDiff, a unified multi-modal model that integrates image-text contrastive learning (ITC), text-conditioned image synthesis learning (IS), and reciprocal semantic consistency modeling (RSC)
UniDiff demonstrates versatility in both multi-modal understanding and generative tasks.
arXiv Detail & Related papers (2023-06-01T15:39:38Z) - Wavelet-based Unsupervised Label-to-Image Translation [9.339522647331334]
We propose a new Unsupervised paradigm for SIS (USIS) that makes use of a self-supervised segmentation loss and whole image wavelet based discrimination.
We test our methodology on 3 challenging datasets and demonstrate its ability to bridge the performance gap between paired and unpaired models.
arXiv Detail & Related papers (2023-05-16T17:48:44Z) - High-Quality Pluralistic Image Completion via Code Shared VQGAN [51.7805154545948]
We present a novel framework for pluralistic image completion that can achieve both high quality and diversity at much faster inference speed.
Our framework is able to learn semantically-rich discrete codes efficiently and robustly, resulting in much better image reconstruction quality.
arXiv Detail & Related papers (2022-04-05T01:47:35Z) - Diverse Semantic Image Synthesis via Probability Distribution Modeling [103.88931623488088]
We propose a novel diverse semantic image synthesis framework.
Our method can achieve superior diversity and comparable quality compared to state-of-the-art methods.
arXiv Detail & Related papers (2021-03-11T18:59:25Z) - You Only Need Adversarial Supervision for Semantic Image Synthesis [84.83711654797342]
We propose a novel, simplified GAN model, which needs only adversarial supervision to achieve high quality results.
We show that images synthesized by our model are more diverse and follow the color and texture of real images more closely.
arXiv Detail & Related papers (2020-12-08T23:00:48Z) - Improving Augmentation and Evaluation Schemes for Semantic Image
Synthesis [16.097324852253912]
We introduce a novel augmentation scheme designed specifically for generative adversarial networks (GANs)
We propose to randomly warp object shapes in the semantic label maps used as an input to the generator.
The local shape discrepancies between the warped and non-warped label maps and images enable the GAN to learn better the structural and geometric details of the scene.
arXiv Detail & Related papers (2020-11-25T10:55:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.