Related papers: FlexAvatar: Flexible Large Reconstruction Model for Animatable Gaussian Head Avatars with Detailed Deformation

FlexAvatar: Flexible Large Reconstruction Model for Animatable Gaussian Head Avatars with Detailed Deformation

URL: http://arxiv.org/abs/2512.17717v1
Date: Fri, 19 Dec 2025 15:51:44 GMT
Title: FlexAvatar: Flexible Large Reconstruction Model for Animatable Gaussian Head Avatars with Detailed Deformation
Authors: Cheng Peng, Zhuo Su, Liao Wang, Chen Guo, Zhaohu Li, Chengjiang Long, Zheng Lv, Jingxiang Sun, Chenyangguang Zhang, Yebin Liu,
Abstract summary: We present FlexAvatar, a flexible large reconstruction model for high-fidelity 3D head avatars.<n>It aggregates flexible input-number-agnostic, camera-pose-free and expression-free inputs into a robust canonical 3D representation.<n>It achieves superior 3D consistency, detailed dynamic realism compared with previous methods.
Score: 52.919328336985636
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present FlexAvatar, a flexible large reconstruction model for high-fidelity 3D head avatars with detailed dynamic deformation from single or sparse images, without requiring camera poses or expression labels. It leverages a transformer-based reconstruction model with structured head query tokens as canonical anchor to aggregate flexible input-number-agnostic, camera-pose-free and expression-free inputs into a robust canonical 3D representation. For detailed dynamic deformation, we introduce a lightweight UNet decoder conditioned on UV-space position maps, which can produce detailed expression-dependent deformations in real time. To better capture rare but critical expressions like wrinkles and bared teeth, we also adopt a data distribution adjustment strategy during training to balance the distribution of these expressions in the training set. Moreover, a lightweight 10-second refinement can further enhances identity-specific details in extreme identities without affecting deformation quality. Extensive experiments demonstrate that our FlexAvatar achieves superior 3D consistency, detailed dynamic realism compared with previous methods, providing a practical solution for animatable 3D avatar creation.

Related papers

LiftAvatar: Kinematic-Space Completion for Expression-Controlled 3D Gaussian Avatar Animation [9.736861648552408]
We present LiftAvatar, a new paradigm that completes sparse monocular observations in kinematic space.<n>It uses the completed signals to drive high-fidelity avatar animation.
arXiv Detail & Related papers (2026-03-02T17:46:32Z)
FastGHA: Generalized Few-Shot 3D Gaussian Head Avatars with Real-Time Animation [26.161556787983496]
OURS is a feed-forward method to generate high-quality Gaussian head avatars from only a few input images.<n>Our approach directly learns a per-pixel Gaussian representation from the input images.<n>Experiments show that our approach significantly outperforms existing methods in both rendering quality and inference efficiency.
arXiv Detail & Related papers (2026-01-20T10:49:49Z)
FlexAvatar: Learning Complete 3D Head Avatars with Partial Supervision [54.69512425050288]
We introduce FlexAvatar, a method for creating high-quality and complete 3D head avatars from a single image.<n>Our training procedure yields a smooth latent avatar space that facilitates identity and flexible fitting to an arbitrary number of input observations.
arXiv Detail & Related papers (2025-12-17T17:09:52Z)
TeGA: Texture Space Gaussian Avatars for High-Resolution Dynamic Head Modeling [52.87836237427514]
Photoreal avatars are seen as a key component in emerging applications in telepresence, extended reality, and entertainment.<n>We present a new high-detail 3D head avatar model that improves upon the state of the art.
arXiv Detail & Related papers (2025-05-08T22:10:27Z)
FRESA: Feedforward Reconstruction of Personalized Skinned Avatars from Few Images [74.86864398919467]
We present a novel method for reconstructing personalized 3D human avatars with realistic animation from only a few images.<n>We learn a universal prior from over a thousand clothed humans to achieve instant feedforward generation and zero-shot generalization.<n>Our method generates more authentic reconstruction and animation than state-of-the-arts, and can be directly generalized to inputs from casually taken phone photos.
arXiv Detail & Related papers (2025-03-24T23:20:47Z)
HeadEvolver: Text to Head Avatars via Expressive and Attribute-Preserving Mesh Deformation [17.590555698266346]
This paper introduces a novel framework for generating stylized head avatars from text guidance.<n>Our method represents mesh deformation with per-face Jacobians and adaptively modulates local deformation using a learnable vector field.<n>Our framework can generate realistic shapes and textures that can be further edited via text, while supporting seamless editing using the preserved attributes from the template mesh.
arXiv Detail & Related papers (2024-03-14T12:15:23Z)
InvertAvatar: Incremental GAN Inversion for Generalized Head Avatars [40.10906393484584]
We propose a novel framework that enhances avatar reconstruction performance using an algorithm designed to increase the fidelity from multiple frames. Our architecture emphasizes pixel-aligned image-to-image translation, mitigating the need to learn correspondences between observation and canonical spaces. The proposed paradigm demonstrates state-of-the-art performance on one-shot and few-shot avatar animation tasks.
arXiv Detail & Related papers (2023-12-03T18:59:15Z)
Generalizable One-shot Neural Head Avatar [90.50492165284724]
We present a method that reconstructs and animates a 3D head avatar from a single-view portrait image. We propose a framework that not only generalizes to unseen identities based on a single-view image, but also captures characteristic details within and beyond the face area.
arXiv Detail & Related papers (2023-06-14T22:33:09Z)
Pixel2Mesh++: 3D Mesh Generation and Refinement from Multi-View Images [82.32776379815712]
We study the problem of shape generation in 3D mesh representation from a small number of color images with or without camera poses. We adopt to further improve the shape quality by leveraging cross-view information with a graph convolution network. Our model is robust to the quality of the initial mesh and the error of camera pose, and can be combined with a differentiable function for test-time optimization.
arXiv Detail & Related papers (2022-04-21T03:42:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.