MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion
- URL: http://arxiv.org/abs/2311.12052v3
- Date: Sun, 5 May 2024 05:07:34 GMT
- Title: MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion
- Authors: Di Chang, Yichun Shi, Quankai Gao, Jessica Fu, Hongyi Xu, Guoxian Song, Qing Yan, Yizhe Zhu, Xiao Yang, Mohammad Soleymani,
- Abstract summary: We propose MagicPose, a diffusion-based model for 2D human pose and facial expression.
By leveraging the prior knowledge of image diffusion models, MagicPose generalizes well to unseen human identities and complex poses.
The proposed model is easy to use and can be considered as a plug-in module/extension to Stable Diffusion.
- Score: 22.62170098534097
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we propose MagicPose, a diffusion-based model for 2D human pose and facial expression retargeting. Specifically, given a reference image, we aim to generate a person's new images by controlling the poses and facial expressions while keeping the identity unchanged. To this end, we propose a two-stage training strategy to disentangle human motions and appearance (e.g., facial expressions, skin tone and dressing), consisting of (1) the pre-training of an appearance-control block and (2) learning appearance-disentangled pose control. Our novel design enables robust appearance control over generated human images, including body, facial attributes, and even background. By leveraging the prior knowledge of image diffusion models, MagicPose generalizes well to unseen human identities and complex poses without the need for additional fine-tuning. Moreover, the proposed model is easy to use and can be considered as a plug-in module/extension to Stable Diffusion. The code is available at: https://github.com/Boese0601/MagicDance
Related papers
- TALK-Act: Enhance Textural-Awareness for 2D Speaking Avatar Reenactment with Diffusion Model [100.35665852159785]
We propose the Motion-Enhanced Textural-Aware ModeLing for SpeaKing Avatar Reenactment (TALK-Act) framework.
Our key idea is to enhance the textural awareness with explicit motion guidance in diffusion modeling.
Our model can achieve high-fidelity 2D avatar reenactment with only 30 seconds of person-specific data.
arXiv Detail & Related papers (2024-10-14T16:38:10Z) - Towards Localized Fine-Grained Control for Facial Expression Generation [54.82883891478555]
Humans, particularly their faces, are central to content generation due to their ability to convey rich expressions and intent.
Current generative models mostly generate flat neutral expressions and characterless smiles without authenticity.
We propose the use of AUs (action units) for facial expression control in face generation.
arXiv Detail & Related papers (2024-07-25T18:29:48Z) - Synthesizing Moving People with 3D Control [88.68284137105654]
We present a diffusion model-based framework for animating people from a single image for a given target 3D motion sequence.
For the first part, we learn an in-filling diffusion model to hallucinate unseen parts of a person given a single image.
Second, we develop a diffusion-based rendering pipeline, which is controlled by 3D human poses.
arXiv Detail & Related papers (2024-01-19T18:59:11Z) - AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation [49.4220768835379]
AdaMesh is a novel adaptive speech-driven facial animation approach.
It learns the personalized talking style from a reference video of about 10 seconds.
It generates vivid facial expressions and head poses.
arXiv Detail & Related papers (2023-10-11T06:56:08Z) - POCE: Pose-Controllable Expression Editing [75.7701103792032]
This paper presents POCE, an innovative pose-controllable expression editing network.
It can generate realistic facial expressions and head poses simultaneously with just unpaired training images.
The learned model can generate realistic and high-fidelity facial expressions under various new poses.
arXiv Detail & Related papers (2023-04-18T12:26:19Z) - UPGPT: Universal Diffusion Model for Person Image Generation, Editing
and Pose Transfer [15.15576618501609]
Text-to-image models (T2I) have been used to generate high quality images of people.
However, due to the random nature of the generation process, the person has a different appearance.
We propose a multimodal diffusion model that accepts text, pose, and visual prompting.
arXiv Detail & Related papers (2023-04-18T10:05:37Z) - StyleMask: Disentangling the Style Space of StyleGAN2 for Neural Face
Reenactment [47.27033282706179]
We propose a framework that learns to disentangle the identity characteristics of the face from its pose.
We show that the proposed method produces higher quality results even on extreme pose variations.
arXiv Detail & Related papers (2022-09-27T13:22:35Z) - Hallucinating Pose-Compatible Scenes [55.064949607528405]
We present a large-scale generative adversarial network for pose-conditioned scene generation.
We curating a massive meta-dataset containing over 19 million frames of humans in everyday environments.
We leverage our trained model for various applications: hallucinating pose-compatible scene(s) with or without humans, visualizing incompatible scenes and poses, placing a person from one generated image into another scene, and animating pose.
arXiv Detail & Related papers (2021-12-13T18:59:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.