Related papers: OMG-Avatar: One-shot Multi-LOD Gaussian Head Avatar

OMG-Avatar: One-shot Multi-LOD Gaussian Head Avatar

URL: http://arxiv.org/abs/2603.01506v1
Date: Mon, 02 Mar 2026 06:30:53 GMT
Title: OMG-Avatar: One-shot Multi-LOD Gaussian Head Avatar
Authors: Jianqiang Ren, Lin Liu, Steven Hoi,
Abstract summary: OMG-Avatar is a novel One-shot method for animatable 3D head reconstruction from a single image in 0.2s.<n>We employ a transformer-based architecture for global feature extraction and projection-based sampling for local feature acquisition.<n>We introduce a coarse-to-fine learning paradigm to support Level-of-Detail functionality and enhance the perception of hierarchical details.
Score: 8.411047140592077
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose OMG-Avatar, a novel One-shot method that leverages a Multi-LOD (Level-of-Detail) Gaussian representation for animatable 3D head reconstruction from a single image in 0.2s. Our method enables LOD head avatar modeling using a unified model that accommodates diverse hardware capabilities and inference speed requirements. To capture both global and local facial characteristics, we employ a transformer-based architecture for global feature extraction and projection-based sampling for local feature acquisition. These features are effectively fused under the guidance of a depth buffer, ensuring occlusion plausibility. We further introduce a coarse-to-fine learning paradigm to support Level-of-Detail functionality and enhance the perception of hierarchical details. To address the limitations of 3DMMs in modeling non-head regions such as the shoulders, we introduce a multi-region decomposition scheme in which the head and shoulders are predicted separately and then integrated through cross-region combination. Extensive experiments demonstrate that OMG-Avatar outperforms state-of-the-art methods in reconstruction quality, reenactment performance, and computational efficiency.

Related papers

OMEGA-Avatar: One-shot Modeling of 360° Gaussian Avatars [54.688420347927725]
OMEGA-Avatar is the first framework that simultaneously generates a generalizable, 360-complete, and animatable 3D Gaussian head from a single image.<n>We show that OMEGA-Avatar achieves state-of-the-art performance, significantly outperforming existing baselines in 360 full-head completeness.
arXiv Detail & Related papers (2026-02-12T08:16:38Z)
Complementary Information Guided Occupancy Prediction via Multi-Level Representation Fusion [73.11061598576798]
Camera-based occupancy prediction is a mainstream approach for 3D perception in autonomous driving.<n>textbfCIGOcc is a two-stage occupancy prediction framework based on multi-level representation fusion.<n>textbfCIGOcc extracts segmentation, graphics, and depth features from an input image and introduces a deformable multi-level fusion mechanism.
arXiv Detail & Related papers (2025-10-15T06:37:33Z)
ImHead: A Large-scale Implicit Morphable Model for Localized Head Modeling [71.3859346921118]
imHead is a novel implicit 3DMM that not only models expressive 3D head avatars but also facilitates localized editing of the facial features.<n>To train imHead, we curate a large-scale dataset of 4K distinct identities.
arXiv Detail & Related papers (2025-10-12T20:17:34Z)
FMGS-Avatar: Mesh-Guided 2D Gaussian Splatting with Foundation Model Priors for 3D Monocular Avatar Reconstruction [18.570290675633732]
We introduce Mesh-Guided 2D Gaussian Splatting, where 2D primitives are attached directly to template mesh faces with constrained position, rotation, and movement.<n>We leverage foundation models trained on large-scale datasets, such as Sapiens, to complement the limited visual cues from monocular videos.<n> Experimental evaluation demonstrates superior reconstruction quality compared to existing methods, with notable gains in geometric accuracy and appearance fidelity.
arXiv Detail & Related papers (2025-09-18T08:41:41Z)
MuGS: Multi-Baseline Generalizable Gaussian Splatting Reconstruction [32.14335364083271]
We present Multi-Baseline Gaussian Splatting (MuGS), a feed-forward approach for novel view synthesis.<n>MuGS effectively handles diverse baseline settings, including sparse input views with both small and large baselines.<n>We demonstrate promising zero-shot performance on the LLFF and Mip-NeRF 360 datasets.
arXiv Detail & Related papers (2025-08-06T10:34:24Z)
M3D: Dual-Stream Selective State Spaces and Depth-Driven Framework for High-Fidelity Single-View 3D Reconstruction [3.2228041579285978]
M3D is a novel single-view 3D reconstruction framework for complex scenes. It balances the extraction of global and local features, thereby improving scene comprehension and representation precision. Results indicate that the fusion of multi-scale features with depth information via the dual-branch feature extraction significantly boosts geometric consistency and fidelity.
arXiv Detail & Related papers (2024-11-19T16:49:24Z)
Anti-Aliased Neural Implicit Surfaces with Encoding Level of Detail [54.03399077258403]
We present LoD-NeuS, an efficient neural representation for high-frequency geometry detail recovery and anti-aliased novel view rendering. Our representation aggregates space features from a multi-convolved featurization within a conical frustum along a ray.
arXiv Detail & Related papers (2023-09-19T05:44:00Z)
Generalizable One-shot Neural Head Avatar [90.50492165284724]
We present a method that reconstructs and animates a 3D head avatar from a single-view portrait image. We propose a framework that not only generalizes to unseen identities based on a single-view image, but also captures characteristic details within and beyond the face area.
arXiv Detail & Related papers (2023-06-14T22:33:09Z)
Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos [47.94545609011594]
We propose a method to learn a high-quality implicit 3D head avatar from a monocular RGB video captured in the wild. Our hybrid pipeline combines the geometry prior and dynamic tracking of a 3DMM with a neural radiance field to achieve fine-grained control and photorealism.
arXiv Detail & Related papers (2023-04-04T01:10:04Z)
Direct Multi-view Multi-person 3D Pose Estimation [138.48139701871213]
We present Multi-view Pose transformer (MvP) for estimating multi-person 3D poses from multi-view images. MvP directly regresses the multi-person 3D poses in a clean and efficient way, without relying on intermediate tasks. We show experimentally that our MvP model outperforms the state-of-the-art methods on several benchmarks while being much more efficient.
arXiv Detail & Related papers (2021-11-07T13:09:20Z)
DeepMultiCap: Performance Capture of Multiple Characters Using Sparse Multiview Cameras [63.186486240525554]
DeepMultiCap is a novel method for multi-person performance capture using sparse multi-view cameras. Our method can capture time varying surface details without the need of using pre-scanned template models.
arXiv Detail & Related papers (2021-05-01T14:32:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.