Human Pose Transfer with Augmented Disentangled Feature Consistency
- URL: http://arxiv.org/abs/2107.10984v4
- Date: Fri, 15 Dec 2023 03:45:14 GMT
- Title: Human Pose Transfer with Augmented Disentangled Feature Consistency
- Authors: Kun Wu, Chengxiang Yin, Zhengping Che, Bo Jiang, Jian Tang, Zheng Guan
and Gangyi Ding
- Abstract summary: We propose a pose transfer network with augmented Disentangled Feature Consistency (DFC-Net) to facilitate human pose transfer.
Given a pair of images containing the source and target person, DFC-Net extracts pose and static information from the source and target respectively, then synthesizes an image of the target person with the desired pose from the source.
- Score: 28.744108771350078
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep generative models have made great progress in synthesizing images with
arbitrary human poses and transferring poses of one person to others. Though
many different methods have been proposed to generate images with high visual
fidelity, the main challenge remains and comes from two fundamental issues:
pose ambiguity and appearance inconsistency. To alleviate the current
limitations and improve the quality of the synthesized images, we propose a
pose transfer network with augmented Disentangled Feature Consistency (DFC-Net)
to facilitate human pose transfer. Given a pair of images containing the source
and target person, DFC-Net extracts pose and static information from the source
and target respectively, then synthesizes an image of the target person with
the desired pose from the source. Moreover, DFC-Net leverages disentangled
feature consistency losses in the adversarial training to strengthen the
transfer coherence and integrates a keypoint amplifier to enhance the pose
feature extraction. With the help of the disentangled feature consistency
losses, we further propose a novel data augmentation scheme that introduces
unpaired support data with the augmented consistency constraints to improve the
generality and robustness of DFC-Net. Extensive experimental results on
Mixamo-Pose and EDN-10k have demonstrated DFC-Net achieves state-of-the-art
performance on pose transfer.
Related papers
- Noise Consistency Regularization for Improved Subject-Driven Image Synthesis [55.75426086791612]
Fine-tuning Stable Diffusion enables subject-driven image synthesis by adapting the model to generate images containing specific subjects.<n>Existing fine-tuning methods suffer from two key issues: underfitting, where the model fails to reliably capture subject identity, and overfitting, where it memorizes the subject image and reduces background diversity.<n>We propose two auxiliary consistency losses for diffusion fine-tuning. First, a prior consistency regularization loss ensures that the predicted diffusion noise for prior (non-subject) images remains consistent with that of the pretrained model, improving fidelity.
arXiv Detail & Related papers (2025-06-06T19:17:37Z) - DRDM: A Disentangled Representations Diffusion Model for Synthesizing Realistic Person Images [9.768951663960257]
We propose a Disentangled Representations Diffusion Model (DRDM) to generate photo-realistic images from source portraits.
First, a pose encoder is responsible for encoding pose features into a high-dimensional space to guide the generation of person images.
Second, a body-part subspace decoupling block (BSDB) disentangles features from the different body parts of a source figure and feeds them to the various layers of the noise prediction block.
arXiv Detail & Related papers (2024-12-25T06:36:24Z) - Consistent Human Image and Video Generation with Spatially Conditioned Diffusion [82.4097906779699]
Consistent human-centric image and video synthesis aims to generate images with new poses while preserving appearance consistency with a given reference image.
We frame the task as a spatially-conditioned inpainting problem, where the target image is in-painted to maintain appearance consistency with the reference.
This approach enables the reference features to guide the generation of pose-compliant targets within a unified denoising network.
arXiv Detail & Related papers (2024-12-19T05:02:30Z) - Realistic and Efficient Face Swapping: A Unified Approach with Diffusion Models [69.50286698375386]
We propose a novel approach that better harnesses diffusion models for face-swapping.
We introduce a mask shuffling technique during inpainting training, which allows us to create a so-called universal model for swapping.
Ours is a relatively unified approach and so it is resilient to errors in other off-the-shelf models.
arXiv Detail & Related papers (2024-09-11T13:43:53Z) - DPoser: Diffusion Model as Robust 3D Human Pose Prior [51.75784816929666]
We introduce DPoser, a robust and versatile human pose prior built upon diffusion models.
DPoser regards various pose-centric tasks as inverse problems and employs variational diffusion sampling for efficient solving.
Our approach demonstrates considerable enhancements over common uniform scheduling used in image domains, boasting improvements of 5.4%, 17.2%, and 3.8% across human mesh recovery, pose completion, and motion denoising, respectively.
arXiv Detail & Related papers (2023-12-09T11:18:45Z) - Advancing Pose-Guided Image Synthesis with Progressive Conditional Diffusion Models [13.019535928387702]
This paper presents Progressive Conditional Diffusion Models (PCDMs) that incrementally bridge the gap between person images under the target and source poses through three stages.
Both qualitative and quantitative results demonstrate the consistency and photorealism of our proposed PCDMs under challenging scenarios.
arXiv Detail & Related papers (2023-10-10T05:13:17Z) - Person Image Synthesis via Denoising Diffusion Model [116.34633988927429]
We show how denoising diffusion models can be applied for high-fidelity person image synthesis.
Our results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios.
arXiv Detail & Related papers (2022-11-22T18:59:50Z) - Robust Single Image Dehazing Based on Consistent and Contrast-Assisted
Reconstruction [95.5735805072852]
We propose a novel density-variational learning framework to improve the robustness of the image dehzing model.
Specifically, the dehazing network is optimized under the consistency-regularized framework.
Our method significantly surpasses the state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-29T08:11:04Z) - FDA-GAN: Flow-based Dual Attention GAN for Human Pose Transfer [3.08426078422188]
We propose a Flow-based Dual Attention GAN (FDA-GAN) to apply occlusion- and deformation-aware feature fusion for higher generation quality.
To maintain the pose and global position consistency in transferring, we design a pose normalization network for learning adaptive normalization from the target pose to the source person.
Both qualitative and quantitative results show that our method outperforms state-of-the-art models in public iPER and DeepFashion datasets.
arXiv Detail & Related papers (2021-12-01T05:10:37Z) - Structure-aware Person Image Generation with Pose Decomposition and
Semantic Correlation [29.727033198797518]
We propose a structure-aware flow based method for high-quality person image generation.
We decompose the human body into different semantic parts and apply different networks to predict the flow fields for these parts separately.
Our method can generate high-quality results under large pose discrepancy and outperforms state-of-the-art methods in both qualitative and quantitative comparisons.
arXiv Detail & Related papers (2021-02-05T03:07:57Z) - PoNA: Pose-guided Non-local Attention for Human Pose Transfer [105.14398322129024]
We propose a new human pose transfer method using a generative adversarial network (GAN) with simplified cascaded blocks.
Our model generates sharper and more realistic images with rich details, while having fewer parameters and faster speed.
arXiv Detail & Related papers (2020-12-13T12:38:29Z) - Adversarial Semantic Data Augmentation for Human Pose Estimation [96.75411357541438]
We propose Semantic Data Augmentation (SDA), a method that augments images by pasting segmented body parts with various semantic granularity.
We also propose Adversarial Semantic Data Augmentation (ASDA), which exploits a generative network to dynamiclly predict tailored pasting configuration.
State-of-the-art results are achieved on challenging benchmarks.
arXiv Detail & Related papers (2020-08-03T07:56:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.