Advancing Pose-Guided Image Synthesis with Progressive Conditional
Diffusion Models
- URL: http://arxiv.org/abs/2310.06313v3
- Date: Wed, 13 Mar 2024 07:32:06 GMT
- Title: Advancing Pose-Guided Image Synthesis with Progressive Conditional
Diffusion Models
- Authors: Fei Shen, Hu Ye, Jun Zhang, Cong Wang, Xiao Han, Wei Yang
- Abstract summary: This paper presents Progressive Conditional Diffusion Models (PCDMs) that incrementally bridge the gap between person images under the target and source poses through three stages.
Both qualitative and quantitative results demonstrate the consistency and photorealism of our proposed PCDMs under challenging scenarios.
- Score: 13.795706255966259
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work has showcased the significant potential of diffusion models in
pose-guided person image synthesis. However, owing to the inconsistency in pose
between the source and target images, synthesizing an image with a distinct
pose, relying exclusively on the source image and target pose information,
remains a formidable challenge. This paper presents Progressive Conditional
Diffusion Models (PCDMs) that incrementally bridge the gap between person
images under the target and source poses through three stages. Specifically, in
the first stage, we design a simple prior conditional diffusion model that
predicts the global features of the target image by mining the global alignment
relationship between pose coordinates and image appearance. Then, the second
stage establishes a dense correspondence between the source and target images
using the global features from the previous stage, and an inpainting
conditional diffusion model is proposed to further align and enhance the
contextual features, generating a coarse-grained person image. In the third
stage, we propose a refining conditional diffusion model to utilize the
coarsely generated image from the previous stage as a condition, achieving
texture restoration and enhancing fine-detail consistency. The three-stage
PCDMs work progressively to generate the final high-quality and high-fidelity
synthesized image. Both qualitative and quantitative results demonstrate the
consistency and photorealism of our proposed PCDMs under challenging
scenarios.The code and model will be available at
https://github.com/tencent-ailab/PCDMs.
Related papers
- Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models [65.82564074712836]
We introduce DIFfusionHOI, a new HOI detector shedding light on text-to-image diffusion models.
We first devise an inversion-based strategy to learn the expression of relation patterns between humans and objects in embedding space.
These learned relation embeddings then serve as textual prompts, to steer diffusion models generate images that depict specific interactions.
arXiv Detail & Related papers (2024-10-26T12:00:33Z) - DiffHarmony: Latent Diffusion Model Meets Image Harmonization [11.500358677234939]
Diffusion models have promoted the rapid development of image-to-image translation tasks.
Fine-tuning pre-trained latent diffusion models from scratch is computationally intensive.
In this paper, we adapt a pre-trained latent diffusion model to the image harmonization task to generate harmonious but potentially blurry initial images.
arXiv Detail & Related papers (2024-04-09T09:05:23Z) - Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis [65.7968515029306]
We propose a novel Coarse-to-Fine Latent Diffusion (CFLD) method for Pose-Guided Person Image Synthesis (PGPIS)
A perception-refined decoder is designed to progressively refine a set of learnable queries and extract semantic understanding of person images as a coarse-grained prompt.
arXiv Detail & Related papers (2024-02-28T06:07:07Z) - JoReS-Diff: Joint Retinex and Semantic Priors in Diffusion Model for Low-light Image Enhancement [69.6035373784027]
Low-light image enhancement (LLIE) has achieved promising performance by employing conditional diffusion models.
Previous methods may neglect the importance of a sufficient formulation of task-specific condition strategy.
We propose JoReS-Diff, a novel approach that incorporates Retinex- and semantic-based priors as the additional pre-processing condition.
arXiv Detail & Related papers (2023-12-20T08:05:57Z) - Multi-View Unsupervised Image Generation with Cross Attention Guidance [23.07929124170851]
This paper introduces a novel pipeline for unsupervised training of a pose-conditioned diffusion model on single-category datasets.
We identify object poses by clustering the dataset through comparing visibility and locations of specific object parts.
Our model, MIRAGE, surpasses prior work in novel view synthesis on real images.
arXiv Detail & Related papers (2023-12-07T14:55:13Z) - High-fidelity Person-centric Subject-to-Image Synthesis [13.785484396436367]
Face-diffuser is an effective collaborative generation pipeline to eliminate the above training imbalance and quality compromise.
The sampling process is divided into three sequential stages, i.e., semantic scene construction, subject-scene fusion, and subject enhancement.
The subject-scene fusion stage, that is the collaboration achieved through a novel and highly effective mechanism, Saliency-adaptive Noise Fusion.
arXiv Detail & Related papers (2023-11-17T05:03:53Z) - Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and
Latent Diffusion [50.59261592343479]
We present Kandinsky1, a novel exploration of latent diffusion architecture.
The proposed model is trained separately to map text embeddings to image embeddings of CLIP.
We also deployed a user-friendly demo system that supports diverse generative modes such as text-to-image generation, image fusion, text and image fusion, image variations generation, and text-guided inpainting/outpainting.
arXiv Detail & Related papers (2023-10-05T12:29:41Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - Person Image Synthesis via Denoising Diffusion Model [116.34633988927429]
We show how denoising diffusion models can be applied for high-fidelity person image synthesis.
Our results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios.
arXiv Detail & Related papers (2022-11-22T18:59:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.