OMG-Avatar: One-shot Multi-LOD Gaussian Head Avatar
- URL: http://arxiv.org/abs/2603.01506v1
- Date: Mon, 02 Mar 2026 06:30:53 GMT
- Title: OMG-Avatar: One-shot Multi-LOD Gaussian Head Avatar
- Authors: Jianqiang Ren, Lin Liu, Steven Hoi,
- Abstract summary: OMG-Avatar is a novel One-shot method for animatable 3D head reconstruction from a single image in 0.2s.<n>We employ a transformer-based architecture for global feature extraction and projection-based sampling for local feature acquisition.<n>We introduce a coarse-to-fine learning paradigm to support Level-of-Detail functionality and enhance the perception of hierarchical details.
- Score: 8.411047140592077
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose OMG-Avatar, a novel One-shot method that leverages a Multi-LOD (Level-of-Detail) Gaussian representation for animatable 3D head reconstruction from a single image in 0.2s. Our method enables LOD head avatar modeling using a unified model that accommodates diverse hardware capabilities and inference speed requirements. To capture both global and local facial characteristics, we employ a transformer-based architecture for global feature extraction and projection-based sampling for local feature acquisition. These features are effectively fused under the guidance of a depth buffer, ensuring occlusion plausibility. We further introduce a coarse-to-fine learning paradigm to support Level-of-Detail functionality and enhance the perception of hierarchical details. To address the limitations of 3DMMs in modeling non-head regions such as the shoulders, we introduce a multi-region decomposition scheme in which the head and shoulders are predicted separately and then integrated through cross-region combination. Extensive experiments demonstrate that OMG-Avatar outperforms state-of-the-art methods in reconstruction quality, reenactment performance, and computational efficiency.
Related papers
- OMEGA-Avatar: One-shot Modeling of 360° Gaussian Avatars [54.688420347927725]
OMEGA-Avatar is the first framework that simultaneously generates a generalizable, 360-complete, and animatable 3D Gaussian head from a single image.<n>We show that OMEGA-Avatar achieves state-of-the-art performance, significantly outperforming existing baselines in 360 full-head completeness.
arXiv Detail & Related papers (2026-02-12T08:16:38Z) - Complementary Information Guided Occupancy Prediction via Multi-Level Representation Fusion [73.11061598576798]
Camera-based occupancy prediction is a mainstream approach for 3D perception in autonomous driving.<n>textbfCIGOcc is a two-stage occupancy prediction framework based on multi-level representation fusion.<n>textbfCIGOcc extracts segmentation, graphics, and depth features from an input image and introduces a deformable multi-level fusion mechanism.
arXiv Detail & Related papers (2025-10-15T06:37:33Z) - ImHead: A Large-scale Implicit Morphable Model for Localized Head Modeling [71.3859346921118]
imHead is a novel implicit 3DMM that not only models expressive 3D head avatars but also facilitates localized editing of the facial features.<n>To train imHead, we curate a large-scale dataset of 4K distinct identities.
arXiv Detail & Related papers (2025-10-12T20:17:34Z) - FMGS-Avatar: Mesh-Guided 2D Gaussian Splatting with Foundation Model Priors for 3D Monocular Avatar Reconstruction [18.570290675633732]
We introduce Mesh-Guided 2D Gaussian Splatting, where 2D primitives are attached directly to template mesh faces with constrained position, rotation, and movement.<n>We leverage foundation models trained on large-scale datasets, such as Sapiens, to complement the limited visual cues from monocular videos.<n> Experimental evaluation demonstrates superior reconstruction quality compared to existing methods, with notable gains in geometric accuracy and appearance fidelity.
arXiv Detail & Related papers (2025-09-18T08:41:41Z) - MuGS: Multi-Baseline Generalizable Gaussian Splatting Reconstruction [32.14335364083271]
We present Multi-Baseline Gaussian Splatting (MuGS), a feed-forward approach for novel view synthesis.<n>MuGS effectively handles diverse baseline settings, including sparse input views with both small and large baselines.<n>We demonstrate promising zero-shot performance on the LLFF and Mip-NeRF 360 datasets.
arXiv Detail & Related papers (2025-08-06T10:34:24Z) - M3D: Dual-Stream Selective State Spaces and Depth-Driven Framework for High-Fidelity Single-View 3D Reconstruction [3.2228041579285978]
M3D is a novel single-view 3D reconstruction framework for complex scenes.
It balances the extraction of global and local features, thereby improving scene comprehension and representation precision.
Results indicate that the fusion of multi-scale features with depth information via the dual-branch feature extraction significantly boosts geometric consistency and fidelity.
arXiv Detail & Related papers (2024-11-19T16:49:24Z) - Anti-Aliased Neural Implicit Surfaces with Encoding Level of Detail [54.03399077258403]
We present LoD-NeuS, an efficient neural representation for high-frequency geometry detail recovery and anti-aliased novel view rendering.
Our representation aggregates space features from a multi-convolved featurization within a conical frustum along a ray.
arXiv Detail & Related papers (2023-09-19T05:44:00Z) - Generalizable One-shot Neural Head Avatar [90.50492165284724]
We present a method that reconstructs and animates a 3D head avatar from a single-view portrait image.
We propose a framework that not only generalizes to unseen identities based on a single-view image, but also captures characteristic details within and beyond the face area.
arXiv Detail & Related papers (2023-06-14T22:33:09Z) - Learning Personalized High Quality Volumetric Head Avatars from
Monocular RGB Videos [47.94545609011594]
We propose a method to learn a high-quality implicit 3D head avatar from a monocular RGB video captured in the wild.
Our hybrid pipeline combines the geometry prior and dynamic tracking of a 3DMM with a neural radiance field to achieve fine-grained control and photorealism.
arXiv Detail & Related papers (2023-04-04T01:10:04Z) - Direct Multi-view Multi-person 3D Pose Estimation [138.48139701871213]
We present Multi-view Pose transformer (MvP) for estimating multi-person 3D poses from multi-view images.
MvP directly regresses the multi-person 3D poses in a clean and efficient way, without relying on intermediate tasks.
We show experimentally that our MvP model outperforms the state-of-the-art methods on several benchmarks while being much more efficient.
arXiv Detail & Related papers (2021-11-07T13:09:20Z) - DeepMultiCap: Performance Capture of Multiple Characters Using Sparse
Multiview Cameras [63.186486240525554]
DeepMultiCap is a novel method for multi-person performance capture using sparse multi-view cameras.
Our method can capture time varying surface details without the need of using pre-scanned template models.
arXiv Detail & Related papers (2021-05-01T14:32:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.