Multi-focal Conditioned Latent Diffusion for Person Image Synthesis
- URL: http://arxiv.org/abs/2503.15686v2
- Date: Sun, 23 Mar 2025 23:10:16 GMT
- Title: Multi-focal Conditioned Latent Diffusion for Person Image Synthesis
- Authors: Jiaqi Liu, Jichao Zhang, Paolo Rota, Nicu Sebe,
- Abstract summary: The Latent Diffusion Model (LDM) has demonstrated strong capabilities in high-resolution image generation.<n>We propose a Multi-focal Conditioned Latent Diffusion (MCLD) method to address these limitations.<n>Our approach utilizes a multi-focal condition aggregation module, which effectively integrates facial identity and texture-specific information.
- Score: 59.113899155476005
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Latent Diffusion Model (LDM) has demonstrated strong capabilities in high-resolution image generation and has been widely employed for Pose-Guided Person Image Synthesis (PGPIS), yielding promising results. However, the compression process of LDM often results in the deterioration of details, particularly in sensitive areas such as facial features and clothing textures. In this paper, we propose a Multi-focal Conditioned Latent Diffusion (MCLD) method to address these limitations by conditioning the model on disentangled, pose-invariant features from these sensitive regions. Our approach utilizes a multi-focal condition aggregation module, which effectively integrates facial identity and texture-specific information, enhancing the model's ability to produce appearance realistic and identity-consistent images. Our method demonstrates consistent identity and appearance generation on the DeepFashion dataset and enables flexible person image editing due to its generation consistency. The code is available at https://github.com/jqliu09/mcld.
Related papers
- ID-Booth: Identity-consistent Face Generation with Diffusion Models [10.042492056152232]
We present a novel generative diffusion-based framework called ID-Booth.
The framework enables identity-consistent image generation while retaining the synthesis capabilities of pretrained diffusion models.
Our method facilitates better intra-identity consistency and inter-identity separability than competing methods, while achieving higher image diversity.
arXiv Detail & Related papers (2025-04-10T02:20:18Z) - DRDM: A Disentangled Representations Diffusion Model for Synthesizing Realistic Person Images [9.768951663960257]
We propose a Disentangled Representations Diffusion Model (DRDM) to generate photo-realistic images from source portraits.<n>First, a pose encoder is responsible for encoding pose features into a high-dimensional space to guide the generation of person images.<n>Second, a body-part subspace decoupling block (BSDB) disentangles features from the different body parts of a source figure and feeds them to the various layers of the noise prediction block.
arXiv Detail & Related papers (2024-12-25T06:36:24Z) - Realistic and Efficient Face Swapping: A Unified Approach with Diffusion Models [69.50286698375386]
We propose a novel approach that better harnesses diffusion models for face-swapping.
We introduce a mask shuffling technique during inpainting training, which allows us to create a so-called universal model for swapping.
Ours is a relatively unified approach and so it is resilient to errors in other off-the-shelf models.
arXiv Detail & Related papers (2024-09-11T13:43:53Z) - DiffFAE: Advancing High-fidelity One-shot Facial Appearance Editing with Space-sensitive Customization and Semantic Preservation [84.0586749616249]
This paper presents DiffFAE, a one-stage and highly-efficient diffusion-based framework tailored for high-fidelity Facial Appearance Editing.
For high-fidelity query attributes transfer, we adopt Space-sensitive Physical Customization (SPC), which ensures the fidelity and generalization ability.
In order to preserve source attributes, we introduce the Region-responsive Semantic Composition (RSC)
This module is guided to learn decoupled source-regarding features, thereby better preserving the identity and alleviating artifacts from non-facial attributes such as hair, clothes, and background.
arXiv Detail & Related papers (2024-03-26T12:53:10Z) - Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis [65.7968515029306]
We propose a novel Coarse-to-Fine Latent Diffusion (CFLD) method for Pose-Guided Person Image Synthesis (PGPIS)
A perception-refined decoder is designed to progressively refine a set of learnable queries and extract semantic understanding of person images as a coarse-grained prompt.
arXiv Detail & Related papers (2024-02-28T06:07:07Z) - DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing [94.24479528298252]
DragGAN is an interactive point-based image editing framework that achieves impressive editing results with pixel-level precision.
By harnessing large-scale pretrained diffusion models, we greatly enhance the applicability of interactive point-based editing on both real and diffusion-generated images.
We present a challenging benchmark dataset called DragBench to evaluate the performance of interactive point-based image editing methods.
arXiv Detail & Related papers (2023-06-26T06:04:09Z) - Conditioning Diffusion Models via Attributes and Semantic Masks for Face
Generation [1.104121146441257]
Deep generative models have shown impressive results in generating realistic images of faces.
GANs managed to generate high-quality, high-fidelity images when conditioned on semantic masks, but they still lack the ability to diversify their output.
We propose a multi-conditioning approach for diffusion models via cross-attention exploiting both attributes and semantic masks to generate high-quality and controllable face images.
arXiv Detail & Related papers (2023-06-01T17:16:37Z) - Person Image Synthesis via Denoising Diffusion Model [116.34633988927429]
We show how denoising diffusion models can be applied for high-fidelity person image synthesis.
Our results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios.
arXiv Detail & Related papers (2022-11-22T18:59:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.