DeX-Portrait: Disentangled and Expressive Portrait Animation via Explicit and Latent Motion Representations
- URL: http://arxiv.org/abs/2512.15524v1
- Date: Wed, 17 Dec 2025 15:23:57 GMT
- Title: DeX-Portrait: Disentangled and Expressive Portrait Animation via Explicit and Latent Motion Representations
- Authors: Yuxiang Shi, Zhe Li, Yanwen Wang, Hao Zhu, Xun Cao, Ligang Liu,
- Abstract summary: We propose DeX-Portrait, a novel approach capable of generating expressive portrait animation driven by disentangled pose and expression signals.<n>First, we design a powerful motion trainer to learn both pose and expression encoders for extracting precise and decomposed driving signals.<n>Experiments show that our method outperforms state-of-the-art baselines on both animation quality and disentangled controllability.
- Score: 31.845995837468536
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Portrait animation from a single source image and a driving video is a long-standing problem. Recent approaches tend to adopt diffusion-based image/video generation models for realistic and expressive animation. However, none of these diffusion models realizes high-fidelity disentangled control between the head pose and facial expression, hindering applications like expression-only or pose-only editing and animation. To address this, we propose DeX-Portrait, a novel approach capable of generating expressive portrait animation driven by disentangled pose and expression signals. Specifically, we represent the pose as an explicit global transformation and the expression as an implicit latent code. First, we design a powerful motion trainer to learn both pose and expression encoders for extracting precise and decomposed driving signals. Then we propose to inject the pose transformation into the diffusion model through a dual-branch conditioning mechanism, and the expression latent through cross attention. Finally, we design a progressive hybrid classifier-free guidance for more faithful identity consistency. Experiments show that our method outperforms state-of-the-art baselines on both animation quality and disentangled controllability.
Related papers
- FactorPortrait: Controllable Portrait Animation via Disentangled Expression, Pose, and Viewpoint [49.80464592726769]
We introduce FactorPortrait, a video diffusion method for controllable portrait animation.<n>Our method animates the portrait by transferring facial expressions and head movements from the driving video.<n>Our method outperforms existing approaches in realism, expressiveness, control accuracy, and view consistency.
arXiv Detail & Related papers (2025-12-12T15:22:52Z) - PersonaLive! Expressive Portrait Image Animation for Live Streaming [53.63615310186964]
PersonaLive is a novel diffusion-based framework towards streaming real-time portrait animation.<n>We first adopt hybrid implicit signals, namely implicit facial representations and 3D implicit keypoints, to achieve expressive image-level motion control.<n>Experiments demonstrate that PersonaLive achieves state-of-the-art performance with up to 7-22x speedup over prior diffusion-based portrait animation models.
arXiv Detail & Related papers (2025-12-12T03:24:40Z) - DirectSwap: Mask-Free Cross-Identity Training and Benchmarking for Expression-Consistent Video Head Swapping [58.2549561389375]
Video head swapping aims to replace the entire head of a video subject, including facial identity, head shape, and hairstyle, with that of a reference image.<n>Due to the lack of ground-truth paired swapping data, prior methods typically train on cross-frame pairs of the same person within a video.<n>We propose DirectSwap, a mask-free, direct video head-swapping framework that extends an image U-Net into a video diffusion model.
arXiv Detail & Related papers (2025-12-10T08:31:28Z) - Stable Video-Driven Portraits [52.008400639227034]
Animation aims to generate photo-realistic videos from a single source image by reenacting the expression and pose from a driving video.<n>Recent advances using diffusion models have demonstrated improved quality but remain constrained by weak control signals and architectural limitations.<n>We propose a novel diffusion based framework that leverages masked facial regions specifically the eyes, nose, and mouth from the driving video as strong motion control cues.
arXiv Detail & Related papers (2025-09-22T08:11:08Z) - X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent Attention [52.94097577075215]
X-NeMo is a zero-shot diffusion-based portrait animation pipeline.<n>It animates a static portrait using facial movements from a driving video of a different individual.
arXiv Detail & Related papers (2025-07-30T22:46:52Z) - Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation [53.767090490974745]
Follow-Your-Emoji is a diffusion-based framework for portrait animation.
It animates a reference portrait with target landmark sequences.
Our method demonstrates significant performance in controlling the expression of freestyle portraits.
arXiv Detail & Related papers (2024-06-04T02:05:57Z) - X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention [18.211762995744337]
We propose X-Portrait, an innovative conditional diffusion model tailored for generating expressive and temporally coherent portrait animation.
Given a single portrait as appearance reference, we aim to animate it with motion derived from a driving video, capturing both highly dynamic and subtle facial expressions.
Experimental results demonstrate the universal effectiveness of X-Portrait across a diverse range of facial portraits and expressive driving sequences.
arXiv Detail & Related papers (2024-03-23T20:30:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.