Person Image Synthesis via Denoising Diffusion Model
- URL: http://arxiv.org/abs/2211.12500v1
- Date: Tue, 22 Nov 2022 18:59:50 GMT
- Title: Person Image Synthesis via Denoising Diffusion Model
- Authors: Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer,
Jorma Laaksonen, Mubarak Shah, Fahad Shahbaz Khan
- Abstract summary: We show how denoising diffusion models can be applied for high-fidelity person image synthesis.
Our results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios.
- Score: 116.34633988927429
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The pose-guided person image generation task requires synthesizing
photorealistic images of humans in arbitrary poses. The existing approaches use
generative adversarial networks that do not necessarily maintain realistic
textures or need dense correspondences that struggle to handle complex
deformations and severe occlusions. In this work, we show how denoising
diffusion models can be applied for high-fidelity person image synthesis with
strong sample diversity and enhanced mode coverage of the learnt data
distribution. Our proposed Person Image Diffusion Model (PIDM) disintegrates
the complex transfer problem into a series of simpler forward-backward
denoising steps. This helps in learning plausible source-to-target
transformation trajectories that result in faithful textures and undistorted
appearance details. We introduce a 'texture diffusion module' based on
cross-attention to accurately model the correspondences between appearance and
pose information available in source and target images. Further, we propose
'disentangled classifier-free guidance' to ensure close resemblance between the
conditional inputs and the synthesized output in terms of both pose and
appearance information. Our extensive results on two large-scale benchmarks and
a user study demonstrate the photorealism of our proposed approach under
challenging scenarios. We also show how our generated images can help in
downstream tasks. Our code and models will be publicly released.
Related papers
- Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis [65.7968515029306]
We propose a novel Coarse-to-Fine Latent Diffusion (CFLD) method for Pose-Guided Person Image Synthesis (PGPIS)
A perception-refined decoder is designed to progressively refine a set of learnable queries and extract semantic understanding of person images as a coarse-grained prompt.
arXiv Detail & Related papers (2024-02-28T06:07:07Z) - Unlocking Pre-trained Image Backbones for Semantic Image Synthesis [29.688029979801577]
We propose a new class of GAN discriminators for semantic image synthesis that generates highly realistic images.
Our model, which we dub DP-SIMS, achieves state-of-the-art results in terms of image quality and consistency with the input label maps on ADE-20K, COCO-Stuff, and Cityscapes.
arXiv Detail & Related papers (2023-12-20T09:39:19Z) - Learned representation-guided diffusion models for large-image generation [58.192263311786824]
We introduce a novel approach that trains diffusion models conditioned on embeddings from self-supervised learning (SSL)
Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images.
Augmenting real data by generating variations of real images improves downstream accuracy for patch-level and larger, image-scale classification tasks.
arXiv Detail & Related papers (2023-12-12T14:45:45Z) - Multi-View Unsupervised Image Generation with Cross Attention Guidance [23.07929124170851]
This paper introduces a novel pipeline for unsupervised training of a pose-conditioned diffusion model on single-category datasets.
We identify object poses by clustering the dataset through comparing visibility and locations of specific object parts.
Our model, MIRAGE, surpasses prior work in novel view synthesis on real images.
arXiv Detail & Related papers (2023-12-07T14:55:13Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - Training-free Diffusion Model Adaptation for Variable-Sized
Text-to-Image Synthesis [45.19847146506007]
Diffusion models (DMs) have recently gained attention with state-of-the-art performance in text-to-image synthesis.
This paper focuses on adapting text-to-image diffusion models to handle variety while maintaining visual fidelity.
arXiv Detail & Related papers (2023-06-14T17:23:07Z) - Image Generation with Multimodal Priors using Denoising Diffusion
Probabilistic Models [54.1843419649895]
A major challenge in using generative models to accomplish this task is the lack of paired data containing all modalities and corresponding outputs.
We propose a solution based on a denoising diffusion probabilistic synthesis models to generate images under multi-model priors.
arXiv Detail & Related papers (2022-06-10T12:23:05Z) - DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder [73.1010640692609]
We propose a VQ-VAE architecture model with a diffusion decoder (DiVAE) to work as the reconstructing component in image synthesis.
Our model achieves state-of-the-art results and generates more photorealistic images specifically.
arXiv Detail & Related papers (2022-06-01T10:39:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.