Related papers: FactorPortrait: Controllable Portrait Animation via Disentangled Expression, Pose, and Viewpoint

FactorPortrait: Controllable Portrait Animation via Disentangled Expression, Pose, and Viewpoint

URL: http://arxiv.org/abs/2512.11645v1
Date: Fri, 12 Dec 2025 15:22:52 GMT
Title: FactorPortrait: Controllable Portrait Animation via Disentangled Expression, Pose, and Viewpoint
Authors: Jiapeng Tang, Kai Li, Chengxiang Yin, Liuhao Ge, Fei Jiang, Jiu Xu, Matthias Nießner, Christian Häne, Timur Bagautdinov, Egor Zakharov, Peihong Guo,
Abstract summary: We introduce FactorPortrait, a video diffusion method for controllable portrait animation.<n>Our method animates the portrait by transferring facial expressions and head movements from the driving video.<n>Our method outperforms existing approaches in realism, expressiveness, control accuracy, and view consistency.
Score: 49.80464592726769
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We introduce FactorPortrait, a video diffusion method for controllable portrait animation that enables lifelike synthesis from disentangled control signals of facial expressions, head movement, and camera viewpoints. Given a single portrait image, a driving video, and camera trajectories, our method animates the portrait by transferring facial expressions and head movements from the driving video while simultaneously enabling novel view synthesis from arbitrary viewpoints. We utilize a pre-trained image encoder to extract facial expression latents from the driving video as control signals for animation generation. Such latents implicitly capture nuanced facial expression dynamics with identity and pose information disentangled, and they are efficiently injected into the video diffusion transformer through our proposed expression controller. For camera and head pose control, we employ Plücker ray maps and normal maps rendered from 3D body mesh tracking. To train our model, we curate a large-scale synthetic dataset containing diverse combinations of camera viewpoints, head poses, and facial expression dynamics. Extensive experiments demonstrate that our method outperforms existing approaches in realism, expressiveness, control accuracy, and view consistency.

Related papers

DeX-Portrait: Disentangled and Expressive Portrait Animation via Explicit and Latent Motion Representations [31.845995837468536]
We propose DeX-Portrait, a novel approach capable of generating expressive portrait animation driven by disentangled pose and expression signals.<n>First, we design a powerful motion trainer to learn both pose and expression encoders for extracting precise and decomposed driving signals.<n>Experiments show that our method outperforms state-of-the-art baselines on both animation quality and disentangled controllability.
arXiv Detail & Related papers (2025-12-17T15:23:57Z)
DirectSwap: Mask-Free Cross-Identity Training and Benchmarking for Expression-Consistent Video Head Swapping [58.2549561389375]
Video head swapping aims to replace the entire head of a video subject, including facial identity, head shape, and hairstyle, with that of a reference image.<n>Due to the lack of ground-truth paired swapping data, prior methods typically train on cross-frame pairs of the same person within a video.<n>We propose DirectSwap, a mask-free, direct video head-swapping framework that extends an image U-Net into a video diffusion model.
arXiv Detail & Related papers (2025-12-10T08:31:28Z)
X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent Attention [52.94097577075215]
X-NeMo is a zero-shot diffusion-based portrait animation pipeline.<n>It animates a static portrait using facial movements from a driving video of a different individual.
arXiv Detail & Related papers (2025-07-30T22:46:52Z)
HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation [30.030540407121325]
HunyuanPortrait is a diffusion-based condition control method for portrait animation.<n>It can animate the character in the reference image by the facial expression and head pose of the driving videos.<n>Our framework outperforms existing methods, demonstrating superior temporal consistency and controllability.
arXiv Detail & Related papers (2025-03-24T16:35:41Z)
Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer [25.39030226963548]
We introduce the first application of a pretrained transformer-based video generative model for portrait animation.<n>Our method is validated through experiments on benchmark and newly proposed wild datasets.
arXiv Detail & Related papers (2024-12-01T08:54:30Z)
GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations [54.94362657501809]
We propose a new method to generate highly dynamic and deformable human head avatars from multi-view imagery in real-time. At the core of our method is a hierarchical representation of head models that allows to capture the complex dynamics of facial expressions and head movements. We train this coarse-to-fine facial avatar model along with the head pose as a learnable parameter in an end-to-end framework.
arXiv Detail & Related papers (2024-09-18T13:05:43Z)
VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation [79.99551055245071]
We propose VividPose, an end-to-end pipeline that ensures superior temporal stability. An identity-aware appearance controller integrates additional facial information without compromising other appearance details. A geometry-aware pose controller utilizes both dense rendering maps from SMPL-X and sparse skeleton maps. VividPose exhibits superior generalization capabilities on our proposed in-the-wild dataset.
arXiv Detail & Related papers (2024-05-28T13:18:32Z)
X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention [18.211762995744337]
We propose X-Portrait, an innovative conditional diffusion model tailored for generating expressive and temporally coherent portrait animation. Given a single portrait as appearance reference, we aim to animate it with motion derived from a driving video, capturing both highly dynamic and subtle facial expressions. Experimental results demonstrate the universal effectiveness of X-Portrait across a diverse range of facial portraits and expressive driving sequences.
arXiv Detail & Related papers (2024-03-23T20:30:28Z)
PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering [56.762094966235566]
A Portrait Image Neural Renderer is proposed to control the face motions with the parameters of three-dimensional morphable face models. The proposed model can generate photo-realistic portrait images with accurate movements according to intuitive modifications. Our model can generate coherent videos with convincing movements from only a single reference image and a driving audio stream.
arXiv Detail & Related papers (2021-09-17T07:24:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.