Related papers: STAvatar: Soft Binding and Temporal Density Control for Monocular 3D Head Avatars Reconstruction

STAvatar: Soft Binding and Temporal Density Control for Monocular 3D Head Avatars Reconstruction

URL: http://arxiv.org/abs/2511.19854v2
Date: Thu, 27 Nov 2025 01:28:23 GMT
Title: STAvatar: Soft Binding and Temporal Density Control for Monocular 3D Head Avatars Reconstruction
Authors: Jiankuo Zhao, Xiangyu Zhu, Zidu Wang, Zhen Lei,
Abstract summary: Existing methods based on 3D Gaussian Splatting typically bind Gaussians to mesh triangles and model deformations solely via Linear Blend Skinning.<n>We propose STAvatar, which consists of two key components: (1) a UV-Temporal Soft Binding framework that leverages both image-based and geometric priors to learn per-Gaussian feature offsets within the UV space.<n> STAvatar achieves state-of-the-art reconstruction performance, especially in capturing fine-grained details and reconstructing frequently occluded regions.
Score: 18.777422532112105
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Reconstructing high-fidelity and animatable 3D head avatars from monocular videos remains a challenging yet essential task. Existing methods based on 3D Gaussian Splatting typically bind Gaussians to mesh triangles and model deformations solely via Linear Blend Skinning, which results in rigid motion and limited expressiveness. Moreover, they lack specialized strategies to handle frequently occluded regions (e.g., mouth interiors, eyelids). To address these limitations, we propose STAvatar, which consists of two key components: (1) a UV-Adaptive Soft Binding framework that leverages both image-based and geometric priors to learn per-Gaussian feature offsets within the UV space. This UV representation supports dynamic resampling, ensuring full compatibility with Adaptive Density Control (ADC) and enhanced adaptability to shape and textural variations. (2) a Temporal ADC strategy, which first clusters structurally similar frames to facilitate more targeted computation of the densification criterion. It further introduces a novel fused perceptual error as clone criterion to jointly capture geometric and textural discrepancies, encouraging densification in regions requiring finer details. Extensive experiments on four benchmark datasets demonstrate that STAvatar achieves state-of-the-art reconstruction performance, especially in capturing fine-grained details and reconstructing frequently occluded regions. The code will be publicly available.

Related papers

Joint Semantic and Rendering Enhancements in 3D Gaussian Modeling with Anisotropic Local Encoding [86.55824709875598]
We propose a joint enhancement framework for 3D semantic Gaussian modeling that synergizes both semantic and rendering branches.<n>Unlike conventional point cloud shape encoding, we introduce an anisotropic 3D Gaussian Chebyshev descriptor to capture fine-grained 3D shape details.<n>We employ a cross-scene knowledge transfer module to continuously update learned shape patterns, enabling faster convergence and robust representations.
arXiv Detail & Related papers (2026-01-05T18:33:50Z)
GauSSmart: Enhanced 3D Reconstruction through 2D Foundation Models and Geometric Filtering [50.675710727721786]
We propose GauSSmart, a hybrid method that bridges 2D foundational models and 3D Gaussian Splatting reconstruction.<n>Our approach integrates established 2D computer vision techniques, including convex filtering and semantic feature supervision.<n>We validate our approach across three datasets, where GauSSmart consistently outperforms existing Gaussian Splatting.
arXiv Detail & Related papers (2025-10-16T03:38:26Z)
Gesplat: Robust Pose-Free 3D Reconstruction via Geometry-Guided Gaussian Splatting [21.952325954391508]
We introduce Gesplat, a 3DGS-based framework that enables robust novel view synthesis and geometrically consistent reconstruction from unposed sparse images.<n>Our approach achieves more robust performance on both forward-facing and large-scale complex datasets compared to other pose-free methods.
arXiv Detail & Related papers (2025-10-11T08:13:46Z)
Visibility-Aware Densification for 3D Gaussian Splatting in Dynamic Urban Scenes [7.253732091582086]
VAD-GS is a 3DGS framework tailored for geometry recovery in challenging urban scenes.<n>Our method identifies unreliable geometry structures via voxel-based visibility reasoning.<n>It selects informative supporting views through diversity-aware view selection, and recovers missing structures via patch matching-based stereo reconstruction.
arXiv Detail & Related papers (2025-10-10T13:22:12Z)
D$^2$GS: Depth-and-Density Guided Gaussian Splatting for Stable and Accurate Sparse-View Reconstruction [73.61056394880733]
3D Gaussian Splatting (3DGS) enables real-time, high-fidelity novel view synthesis (NVS) with explicit 3D representations.<n>We identify two key failure modes under sparse-view conditions: overfitting in regions with excessive Gaussian density near the camera, and underfitting in distant areas with insufficient Gaussian coverage.<n>We propose a unified framework D$2$GS, comprising two key components: a Depth-and-Density Guided Dropout strategy, and a Distance-Aware Fidelity Enhancement module.
arXiv Detail & Related papers (2025-10-09T17:59:49Z)
HBSplat: Robust Sparse-View Gaussian Reconstruction with Hybrid-Loss Guided Depth and Bidirectional Warping [11.035994094874141]
HBSplat is a framework that seamlessly integrates robust structural cues, virtual view constraints, and occluded region completion.<n> HBSplat sets a new state-of-the-art, achieving up to 21.13 dB PSNR and 0.189 LPIPS, while maintaining real-time inference.
arXiv Detail & Related papers (2025-09-29T15:03:31Z)
FMGS-Avatar: Mesh-Guided 2D Gaussian Splatting with Foundation Model Priors for 3D Monocular Avatar Reconstruction [18.570290675633732]
We introduce Mesh-Guided 2D Gaussian Splatting, where 2D primitives are attached directly to template mesh faces with constrained position, rotation, and movement.<n>We leverage foundation models trained on large-scale datasets, such as Sapiens, to complement the limited visual cues from monocular videos.<n> Experimental evaluation demonstrates superior reconstruction quality compared to existing methods, with notable gains in geometric accuracy and appearance fidelity.
arXiv Detail & Related papers (2025-09-18T08:41:41Z)
Intern-GS: Vision Model Guided Sparse-View 3D Gaussian Splatting [95.61137026932062]
Intern-GS is a novel approach to enhance the process of sparse-view Gaussian splatting.<n>We show that Intern-GS achieves state-of-the-art rendering quality across diverse datasets.
arXiv Detail & Related papers (2025-05-27T05:17:49Z)
MonoGSDF: Exploring Monocular Geometric Cues for Gaussian Splatting-Guided Implicit Surface Reconstruction [86.87464903285208]
We introduce MonoGSDF, a novel method that couples primitives with a neural Signed Distance Field (SDF) for high-quality reconstruction.<n>To handle arbitrary-scale scenes, we propose a scaling strategy for robust generalization.<n>Experiments on real-world datasets outperforms prior methods while maintaining efficiency.
arXiv Detail & Related papers (2024-11-25T20:07:07Z)
Improving Gaussian Splatting with Localized Points Management [52.009874685460694]
Localized Point Management (LPM) is capable of identifying those error-contributing zones in greatest need for both point addition and geometry calibration.<n>LPM applies point densification in the identified zones and then reset the opacity of the points in front of these regions, creating a new opportunity to correct poorly conditioned points.<n> Notably, LPM improves both static 3DGS and dynamic SpaceTimeGS to achieve state-of-the-art rendering quality while retaining real-time speeds.
arXiv Detail & Related papers (2024-06-06T16:55:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.