Related papers: ExpPortrait: Expressive Portrait Generation via Personalized Representation

ExpPortrait: Expressive Portrait Generation via Personalized Representation

URL: http://arxiv.org/abs/2602.19900v1
Date: Mon, 23 Feb 2026 14:41:35 GMT
Title: ExpPortrait: Expressive Portrait Generation via Personalized Representation
Authors: Junyi Wang, Yudong Guo, Boyang Guo, Shengming Yang, Juyong Zhang,
Abstract summary: We propose a high-fidelity personalized head representation that more effectively disentangles expression and identity.<n>This representation captures both static, subject-specific global geometry and dynamic, expression-related details.<n>We use this sophisticated and highly expressive head model as a conditional signal to train a diffusion transformer (DiT)-based generator to synthesize richly detailed portrait videos.
Score: 26.785472525811432
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While diffusion models have shown great potential in portrait generation, generating expressive, coherent, and controllable cinematic portrait videos remains a significant challenge. Existing intermediate signals for portrait generation, such as 2D landmarks and parametric models, have limited disentanglement capabilities and cannot express personalized details due to their sparse or low-rank representation. Therefore, existing methods based on these models struggle to accurately preserve subject identity and expressions, hindering the generation of highly expressive portrait videos. To overcome these limitations, we propose a high-fidelity personalized head representation that more effectively disentangles expression and identity. This representation captures both static, subject-specific global geometry and dynamic, expression-related details. Furthermore, we introduce an expression transfer module to achieve personalized transfer of head pose and expression details between different identities. We use this sophisticated and highly expressive head model as a conditional signal to train a diffusion transformer (DiT)-based generator to synthesize richly detailed portrait videos. Extensive experiments on self- and cross-reenactment tasks demonstrate that our method outperforms previous models in terms of identity preservation, expression accuracy, and temporal stability, particularly in capturing fine-grained details of complex motion.

Related papers

FactorPortrait: Controllable Portrait Animation via Disentangled Expression, Pose, and Viewpoint [49.80464592726769]
We introduce FactorPortrait, a video diffusion method for controllable portrait animation.<n>Our method animates the portrait by transferring facial expressions and head movements from the driving video.<n>Our method outperforms existing approaches in realism, expressiveness, control accuracy, and view consistency.
arXiv Detail & Related papers (2025-12-12T15:22:52Z)
ID-Consistent, Precise Expression Generation with Blendshape-Guided Diffusion [40.50436862878818]
We present a diffusion-based framework that faithfully reimagines any subject under any particular facial expression.<n>Our adapter generalizes beyond basic emotions to subtle micro-expressions and expressive transitions, overlooked by prior works.<n>In addition, a pluggable Reference Adapter enables expression editing in real images by transferring the appearance from a reference frame during synthesis.
arXiv Detail & Related papers (2025-10-06T11:20:56Z)
Multi-focal Conditioned Latent Diffusion for Person Image Synthesis [59.113899155476005]
The Latent Diffusion Model (LDM) has demonstrated strong capabilities in high-resolution image generation.<n>We propose a Multi-focal Conditioned Latent Diffusion (MCLD) method to address these limitations.<n>Our approach utilizes a multi-focal condition aggregation module, which effectively integrates facial identity and texture-specific information.
arXiv Detail & Related papers (2025-03-19T20:50:10Z)
EmojiDiff: Advanced Facial Expression Control with High Identity Preservation in Portrait Generation [8.314556078632412]
We introduce EmojiDiff, the first end-to-end solution that enables simultaneous control of extremely detailed expression (RGB-level) and high-fidelity identity in portrait generation.<n>For decoupled training, we innovate ID-irrelevant Data Iteration (IDI) to synthesize cross-identity expression pairs.<n>We also present ID-enhanced Contrast Alignment (ICA) for further fine-tuning.
arXiv Detail & Related papers (2024-12-02T08:24:11Z)
Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation [53.767090490974745]
Follow-Your-Emoji is a diffusion-based framework for portrait animation. It animates a reference portrait with target landmark sequences. Our method demonstrates significant performance in controlling the expression of freestyle portraits.
arXiv Detail & Related papers (2024-06-04T02:05:57Z)
EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars [36.96390906514729]
MegaPortraits model has demonstrated state-of-the-art results in this domain. We introduce our EMOPortraits model, where we: Enhance the model's capability to faithfully support intense, asymmetric face expressions. We propose a novel multi-view video dataset featuring a wide range of intense and asymmetric facial expressions.
arXiv Detail & Related papers (2024-04-29T21:23:29Z)
Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation [34.72612800373437]
In human-centric content generation, pre-trained text-to-image models struggle to produce user-wanted portrait images. We propose a novel multi-modal face generation framework, capable of simultaneous identity-expression control and more fine-grained expression synthesis.
arXiv Detail & Related papers (2024-01-02T13:28:39Z)
PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved Personalization [92.90392834835751]
PortraitBooth is designed for high efficiency, robust identity preservation, and expression-editable text-to-image generation. PortraitBooth eliminates computational overhead and mitigates identity distortion. It incorporates emotion-aware cross-attention control for diverse facial expressions in generated images.
arXiv Detail & Related papers (2023-12-11T13:03:29Z)
One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field [81.07651217942679]
Talking head generation aims to generate faces that maintain the identity information of the source image and imitate the motion of the driving image. We propose HiDe-NeRF, which achieves high-fidelity and free-view talking-head synthesis.
arXiv Detail & Related papers (2023-04-11T09:47:35Z)
Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation [54.68893964373141]
Talking face generation has historically struggled to produce head movements and natural facial expressions without guidance from additional reference videos. Recent developments in diffusion-based generative models allow for more realistic and stable data synthesis. We present an autoregressive diffusion model that requires only one identity image and audio sequence to generate a video of a realistic talking human head.
arXiv Detail & Related papers (2023-01-06T14:16:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.