HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation
- URL: http://arxiv.org/abs/2503.18860v2
- Date: Tue, 25 Mar 2025 10:59:23 GMT
- Title: HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation
- Authors: Zunnan Xu, Zhentao Yu, Zixiang Zhou, Jun Zhou, Xiaoyu Jin, Fa-Ting Hong, Xiaozhong Ji, Junwei Zhu, Chengfei Cai, Shiyu Tang, Qin Lin, Xiu Li, Qinglin Lu,
- Abstract summary: HunyuanPortrait is a diffusion-based condition control method for portrait animation.<n>It can animate the character in the reference image by the facial expression and head pose of the driving videos.<n>Our framework outperforms existing methods, demonstrating superior temporal consistency and controllability.
- Score: 30.030540407121325
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce HunyuanPortrait, a diffusion-based condition control method that employs implicit representations for highly controllable and lifelike portrait animation. Given a single portrait image as an appearance reference and video clips as driving templates, HunyuanPortrait can animate the character in the reference image by the facial expression and head pose of the driving videos. In our framework, we utilize pre-trained encoders to achieve the decoupling of portrait motion information and identity in videos. To do so, implicit representation is adopted to encode motion information and is employed as control signals in the animation phase. By leveraging the power of stable video diffusion as the main building block, we carefully design adapter layers to inject control signals into the denoising unet through attention mechanisms. These bring spatial richness of details and temporal consistency. HunyuanPortrait also exhibits strong generalization performance, which can effectively disentangle appearance and motion under different image styles. Our framework outperforms existing methods, demonstrating superior temporal consistency and controllability. Our project is available at https://kkakkkka.github.io/HunyuanPortrait.
Related papers
- FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis [12.987186425491242]
We propose a novel framework to generate high-fidelity, coherent talking portraits with controllable motion dynamics.
In the first stage, we employ a clip-level training scheme to establish coherent global motion.
In the second stage, we refine lip movements at the frame level using a lip-tracing mask, ensuring precise synchronization with audio signals.
arXiv Detail & Related papers (2025-04-07T08:56:01Z) - FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors [64.54220123913154]
We introduce FramePainter as an efficient instantiation of image-to-video generation problem.
It only uses a lightweight sparse control encoder to inject editing signals.
It domainantly outperforms previous state-of-the-art methods with far less training data.
arXiv Detail & Related papers (2025-01-14T16:09:16Z) - VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control [66.66226299852559]
VideoAnydoor is a zero-shot video object insertion framework with high-fidelity detail preservation and precise motion control.<n>To preserve the detailed appearance and meanwhile support fine-grained motion control, we design a pixel warper.
arXiv Detail & Related papers (2025-01-02T18:59:54Z) - DisPose: Disentangling Pose Guidance for Controllable Human Image Animation [13.366879755548636]
DisPose aims to disentangle the sparse skeleton pose in human image animation into motion field guidance and keypoint correspondence.
To seamlessly integrate into existing models, we propose a plug-and-play hybrid ControlNet.
arXiv Detail & Related papers (2024-12-12T15:15:59Z) - LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control [13.552097853323207]
Portrait Animation aims to synthesize a video from a single source image, using it as an appearance reference, with motion derived from a driving video, audio, text, or generation.<n>We develop a video-driven portrait animation framework named LivePortrait with a focus on better generalization, controllability, and efficiency for practical usage.
arXiv Detail & Related papers (2024-07-03T14:41:39Z) - Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation [53.767090490974745]
Follow-Your-Emoji is a diffusion-based framework for portrait animation.
It animates a reference portrait with target landmark sequences.
Our method demonstrates significant performance in controlling the expression of freestyle portraits.
arXiv Detail & Related papers (2024-06-04T02:05:57Z) - X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention [18.211762995744337]
We propose X-Portrait, an innovative conditional diffusion model tailored for generating expressive and temporally coherent portrait animation.
Given a single portrait as appearance reference, we aim to animate it with motion derived from a driving video, capturing both highly dynamic and subtle facial expressions.
Experimental results demonstrate the universal effectiveness of X-Portrait across a diverse range of facial portraits and expressive driving sequences.
arXiv Detail & Related papers (2024-03-23T20:30:28Z) - AnimateZero: Video Diffusion Models are Zero-Shot Image Animators [63.938509879469024]
We propose AnimateZero to unveil the pre-trained text-to-video diffusion model, i.e., AnimateDiff.
For appearance control, we borrow intermediate latents and their features from the text-to-image (T2I) generation.
For temporal control, we replace the global temporal attention of the original T2V model with our proposed positional-corrected window attention.
arXiv Detail & Related papers (2023-12-06T13:39:35Z) - Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation [27.700371215886683]
diffusion models have become the mainstream in visual generation research, owing to their robust generative capabilities.
In this paper, we propose a novel framework tailored for character animation.
By expanding the training data, our approach can animate arbitrary characters, yielding superior results in character animation compared to other image-to-video methods.
arXiv Detail & Related papers (2023-11-28T12:27:15Z) - MagicAnimate: Temporally Consistent Human Image Animation using
Diffusion Model [74.84435399451573]
This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence.
Existing animation works typically employ the frame-warping technique to animate the reference image towards the target motion.
We introduce MagicAnimate, a diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity.
arXiv Detail & Related papers (2023-11-27T18:32:31Z) - First Order Motion Model for Image Animation [90.712718329677]
Image animation consists of generating a video sequence so that an object in a source image is animated according to the motion of a driving video.
Our framework addresses this problem without using any annotation or prior information about the specific object to animate.
arXiv Detail & Related papers (2020-02-29T07:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.