FastAvatar: Towards Unified Fast High-Fidelity 3D Avatar Reconstruction with Large Gaussian Reconstruction Transformers
- URL: http://arxiv.org/abs/2508.19754v1
- Date: Wed, 27 Aug 2025 10:30:15 GMT
- Title: FastAvatar: Towards Unified Fast High-Fidelity 3D Avatar Reconstruction with Large Gaussian Reconstruction Transformers
- Authors: Yue Wu, Yufan Wu, Wen Li, Yuxi Lu, Kairui Feng, Xuanhong Chen,
- Abstract summary: FastAvatar is a feedforward 3D avatar framework capable of flexibly leveraging diverse daily recordings.<n>It reconstructs a high-quality 3D Gaussian Splatting (3DGS) model within seconds, using only a single unified model.
- Score: 19.37926572767567
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Despite significant progress in 3D avatar reconstruction, it still faces challenges such as high time complexity, sensitivity to data quality, and low data utilization. We propose FastAvatar, a feedforward 3D avatar framework capable of flexibly leveraging diverse daily recordings (e.g., a single image, multi-view observations, or monocular video) to reconstruct a high-quality 3D Gaussian Splatting (3DGS) model within seconds, using only a single unified model. FastAvatar's core is a Large Gaussian Reconstruction Transformer featuring three key designs: First, a variant VGGT-style transformer architecture aggregating multi-frame cues while injecting initial 3D prompt to predict an aggregatable canonical 3DGS representation; Second, multi-granular guidance encoding (camera pose, FLAME expression, head pose) mitigating animation-induced misalignment for variable-length inputs; Third, incremental Gaussian aggregation via landmark tracking and sliced fusion losses. Integrating these features, FastAvatar enables incremental reconstruction, i.e., improving quality with more observations, unlike prior work wasting input data. This yields a quality-speed-tunable paradigm for highly usable avatar modeling. Extensive experiments show that FastAvatar has higher quality and highly competitive speed compared to existing methods.
Related papers
- LiftAvatar: Kinematic-Space Completion for Expression-Controlled 3D Gaussian Avatar Animation [9.736861648552408]
We present LiftAvatar, a new paradigm that completes sparse monocular observations in kinematic space.<n>It uses the completed signals to drive high-fidelity avatar animation.
arXiv Detail & Related papers (2026-03-02T17:46:32Z) - OMEGA-Avatar: One-shot Modeling of 360° Gaussian Avatars [54.688420347927725]
OMEGA-Avatar is the first framework that simultaneously generates a generalizable, 360-complete, and animatable 3D Gaussian head from a single image.<n>We show that OMEGA-Avatar achieves state-of-the-art performance, significantly outperforming existing baselines in 360 full-head completeness.
arXiv Detail & Related papers (2026-02-12T08:16:38Z) - FastGHA: Generalized Few-Shot 3D Gaussian Head Avatars with Real-Time Animation [26.161556787983496]
OURS is a feed-forward method to generate high-quality Gaussian head avatars from only a few input images.<n>Our approach directly learns a per-pixel Gaussian representation from the input images.<n>Experiments show that our approach significantly outperforms existing methods in both rendering quality and inference efficiency.
arXiv Detail & Related papers (2026-01-20T10:49:49Z) - FastVGGT: Training-Free Acceleration of Visual Geometry Transformer [83.67766078575782]
VGGT is a state-of-the-art feed-forward visual geometry model.<n>We propose FastVGGT, which leverages token merging in the 3D domain through a training-free mechanism for accelerating VGGT.<n>With 1000 input images, FastVGGT achieves a 4x speedup over VGGT while mitigating error accumulation in long-sequence scenarios.
arXiv Detail & Related papers (2025-09-02T17:54:21Z) - FastAvatar: Instant 3D Gaussian Splatting for Faces from Single Unconstrained Poses [23.466614265649373]
We present FastAvatar, a pose-invariant, feed-forward framework that can generate a 3D Gaussian Splatting (3DGS) model from a single face image in near-instant time (10ms)<n>FastAvatar significantly outperforms existing feed-forward face 3DGS methods in reconstruction quality and runs 1000x faster than per-face optimization methods.
arXiv Detail & Related papers (2025-08-25T18:29:05Z) - HORT: Monocular Hand-held Objects Reconstruction with Transformers [61.36376511119355]
Reconstructing hand-held objects in 3D from monocular images is a significant challenge in computer vision.<n>We propose a transformer-based model to efficiently reconstruct dense 3D point clouds of hand-held objects.<n>Our method achieves state-of-the-art accuracy with much faster inference speed, while generalizing well to in-the-wild images.
arXiv Detail & Related papers (2025-03-27T09:45:09Z) - Arc2Avatar: Generating Expressive 3D Avatars from a Single Image via ID Guidance [69.9745497000557]
We introduce Arc2Avatar, the first SDS-based method utilizing a human face foundation model as guidance with just a single image as input.<n>Our avatars maintain a dense correspondence with a human face mesh template, allowing blendshape-based expression generation.
arXiv Detail & Related papers (2025-01-09T17:04:33Z) - Generalizable and Animatable Gaussian Head Avatar [50.34788590904843]
We propose Generalizable and Animatable Gaussian head Avatar (GAGAvatar) for one-shot animatable head avatar reconstruction.
We generate the parameters of 3D Gaussians from a single image in a single forward pass.
Our method exhibits superior performance compared to previous methods in terms of reconstruction quality and expression accuracy.
arXiv Detail & Related papers (2024-10-10T14:29:00Z) - MVGamba: Unify 3D Content Generation as State Space Sequence Modeling [150.80564081817786]
We introduce MVGamba, a general and lightweight Gaussian reconstruction model featuring a multi-view Gaussian reconstructor.<n>With off-the-detail multi-view diffusion models integrated, MVGamba unifies 3D generation tasks from a single image, sparse images, or text prompts.<n>Experiments demonstrate that MVGamba outperforms state-of-the-art baselines in all 3D content generation scenarios with approximately only $0.1times$ of the model size.
arXiv Detail & Related papers (2024-06-10T15:26:48Z) - Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D
Reconstruction with Transformers [37.14235383028582]
We introduce a novel approach for single-view reconstruction that efficiently generates a 3D model from a single image via feed-forward inference.
Our method utilizes two transformer-based networks, namely a point decoder and a triplane decoder, to reconstruct 3D objects using a hybrid Triplane-Gaussian intermediate representation.
arXiv Detail & Related papers (2023-12-14T17:18:34Z) - InvertAvatar: Incremental GAN Inversion for Generalized Head Avatars [40.10906393484584]
We propose a novel framework that enhances avatar reconstruction performance using an algorithm designed to increase the fidelity from multiple frames.
Our architecture emphasizes pixel-aligned image-to-image translation, mitigating the need to learn correspondences between observation and canonical spaces.
The proposed paradigm demonstrates state-of-the-art performance on one-shot and few-shot avatar animation tasks.
arXiv Detail & Related papers (2023-12-03T18:59:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.