LiftAvatar: Kinematic-Space Completion for Expression-Controlled 3D Gaussian Avatar Animation
- URL: http://arxiv.org/abs/2603.02129v1
- Date: Mon, 02 Mar 2026 17:46:32 GMT
- Title: LiftAvatar: Kinematic-Space Completion for Expression-Controlled 3D Gaussian Avatar Animation
- Authors: Hualiang Wei, Shunran Jia, Jialun Liu, Wenhui Li,
- Abstract summary: We present LiftAvatar, a new paradigm that completes sparse monocular observations in kinematic space.<n>It uses the completed signals to drive high-fidelity avatar animation.
- Score: 9.736861648552408
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present LiftAvatar, a new paradigm that completes sparse monocular observations in kinematic space (e.g., facial expressions and head pose) and uses the completed signals to drive high-fidelity avatar animation. LiftAvatar is a fine-grained, expression-controllable large-scale video diffusion Transformer that synthesizes high-quality, temporally coherent expression sequences conditioned on single or multiple reference images. The key idea is to lift incomplete input data into a richer kinematic representation, thereby strengthening both reconstruction and animation in downstream 3D avatar pipelines. To this end, we introduce (i) a multi-granularity expression control scheme that combines shading maps with expression coefficients for precise and stable driving, and (ii) a multi-reference conditioning mechanism that aggregates complementary cues from multiple frames, enabling strong 3D consistency and controllability. As a plug-and-play enhancer, LiftAvatar directly addresses the limited expressiveness and reconstruction artifacts of 3D Gaussian Splatting-based avatars caused by sparse kinematic cues in everyday monocular videos. By expanding incomplete observations into diverse pose-expression variations, LiftAvatar also enables effective prior distillation from large-scale video generative models into 3D pipelines, leading to substantial gains. Extensive experiments show that LiftAvatar consistently boosts animation quality and quantitative metrics of state-of-the-art 3D avatar methods, especially under extreme, unseen expressions.
Related papers
- FastGHA: Generalized Few-Shot 3D Gaussian Head Avatars with Real-Time Animation [26.161556787983496]
OURS is a feed-forward method to generate high-quality Gaussian head avatars from only a few input images.<n>Our approach directly learns a per-pixel Gaussian representation from the input images.<n>Experiments show that our approach significantly outperforms existing methods in both rendering quality and inference efficiency.
arXiv Detail & Related papers (2026-01-20T10:49:49Z) - FlexAvatar: Flexible Large Reconstruction Model for Animatable Gaussian Head Avatars with Detailed Deformation [52.919328336985636]
We present FlexAvatar, a flexible large reconstruction model for high-fidelity 3D head avatars.<n>It aggregates flexible input-number-agnostic, camera-pose-free and expression-free inputs into a robust canonical 3D representation.<n>It achieves superior 3D consistency, detailed dynamic realism compared with previous methods.
arXiv Detail & Related papers (2025-12-19T15:51:44Z) - FlexAvatar: Learning Complete 3D Head Avatars with Partial Supervision [54.69512425050288]
We introduce FlexAvatar, a method for creating high-quality and complete 3D head avatars from a single image.<n>Our training procedure yields a smooth latent avatar space that facilitates identity and flexible fitting to an arbitrary number of input observations.
arXiv Detail & Related papers (2025-12-17T17:09:52Z) - FastAvatar: Towards Unified Fast High-Fidelity 3D Avatar Reconstruction with Large Gaussian Reconstruction Transformers [19.37926572767567]
FastAvatar is a feedforward 3D avatar framework capable of flexibly leveraging diverse daily recordings.<n>It reconstructs a high-quality 3D Gaussian Splatting (3DGS) model within seconds, using only a single unified model.
arXiv Detail & Related papers (2025-08-27T10:30:15Z) - AdaHuman: Animatable Detailed 3D Human Generation with Compositional Multiview Diffusion [56.12859795754579]
AdaHuman is a novel framework that generates high-fidelity animatable 3D avatars from a single in-the-wild image.<n>AdaHuman incorporates two key innovations: a pose-conditioned 3D joint diffusion model and a compositional 3DGS refinement module.
arXiv Detail & Related papers (2025-05-30T17:59:54Z) - TeGA: Texture Space Gaussian Avatars for High-Resolution Dynamic Head Modeling [52.87836237427514]
Photoreal avatars are seen as a key component in emerging applications in telepresence, extended reality, and entertainment.<n>We present a new high-detail 3D head avatar model that improves upon the state of the art.
arXiv Detail & Related papers (2025-05-08T22:10:27Z) - SEGA: Drivable 3D Gaussian Head Avatar from a Single Image [15.117619290414064]
We propose SEGA, a novel approach for 3D drivable Gaussian head Avatar creation.<n>SEGA seamlessly combines priors derived from large-scale 2D datasets with 3D priors learned from multi-view, multi-expression, and multi-ID data.<n>Experiments show our method outperforms state-of-the-art approaches in generalization ability, identity preservation, and expression realism.
arXiv Detail & Related papers (2025-04-19T18:23:31Z) - DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D
Diffusion [69.67970568012599]
We present DreamWaltz-G, a novel learning framework for animatable 3D avatar generation from text.
The core of this framework lies in Score Distillation and Hybrid 3D Gaussian Avatar representation.
Our framework further supports diverse applications, including human video reenactment and multi-subject scene composition.
arXiv Detail & Related papers (2024-09-25T17:59:45Z) - NPGA: Neural Parametric Gaussian Avatars [46.52887358194364]
We propose a data-driven approach to create high-fidelity controllable avatars from multi-view video recordings.
We build our method around 3D Gaussian splatting for its highly efficient rendering and to inherit the topological flexibility of point clouds.
We evaluate our method on the public NeRSemble dataset, demonstrating that NPGA significantly outperforms the previous state-of-the-art avatars on the self-reenactment task by 2.6 PSNR.
arXiv Detail & Related papers (2024-05-29T17:58:09Z) - GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians [51.46168990249278]
We present an efficient approach to creating realistic human avatars with dynamic 3D appearances from a single video.
GustafAvatar is validated on both the public dataset and our collected dataset.
arXiv Detail & Related papers (2023-12-04T18:55:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.