AlignGS: Aligning Geometry and Semantics for Robust Indoor Reconstruction from Sparse Views
- URL: http://arxiv.org/abs/2510.07839v1
- Date: Thu, 09 Oct 2025 06:30:20 GMT
- Title: AlignGS: Aligning Geometry and Semantics for Robust Indoor Reconstruction from Sparse Views
- Authors: Yijie Gao, Houqiang Zhong, Tianchi Zhu, Zhengxue Cheng, Qiang Hu, Li Song,
- Abstract summary: The demand for semantically rich 3D models of indoor scenes is rapidly growing, driven by applications in augmented reality, virtual reality, and robotics.<n>Existing methods often treat semantics as a passive feature painted on an already-formed, and potentially flawed, geometry.<n>This paper introduces AlignGS, a novel framework that actualizes this vision by pioneering a synergistic, end-to-end optimization of geometry and semantics.
- Score: 18.361136390711415
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The demand for semantically rich 3D models of indoor scenes is rapidly growing, driven by applications in augmented reality, virtual reality, and robotics. However, creating them from sparse views remains a challenge due to geometric ambiguity. Existing methods often treat semantics as a passive feature painted on an already-formed, and potentially flawed, geometry. We posit that for robust sparse-view reconstruction, semantic understanding instead be an active, guiding force. This paper introduces AlignGS, a novel framework that actualizes this vision by pioneering a synergistic, end-to-end optimization of geometry and semantics. Our method distills rich priors from 2D foundation models and uses them to directly regularize the 3D representation through a set of novel semantic-to-geometry guidance mechanisms, including depth consistency and multi-faceted normal regularization. Extensive evaluations on standard benchmarks demonstrate that our approach achieves state-of-the-art results in novel view synthesis and produces reconstructions with superior geometric accuracy. The results validate that leveraging semantic priors as a geometric regularizer leads to more coherent and complete 3D models from limited input views. Our code is avaliable at https://github.com/MediaX-SJTU/AlignGS .
Related papers
- RnG: A Unified Transformer for Complete 3D Modeling from Partial Observations [70.83499963694238]
RnG (Reconstruction and Generation) is a novel feed-forward Transformer that unifies reconstruction and generation.<n>It reconstructs visible geometry and generates plausible, coherent unseen geometry and appearance.<n>Our method achieves state-of-the-art performance in both generalizable 3D reconstruction and novel view generation.
arXiv Detail & Related papers (2026-03-01T17:25:32Z) - Learning Human Visual Attention on 3D Surfaces through Geometry-Queried Semantic Priors [0.0]
We introduce SemGeo-AttentionNet, a dual-stream architecture that formalizes the interplay between geometric processing and semantic recognition.<n>We extend our framework to temporal scanpath generation through reinforcement learning.<n> Evaluation on SAL3D, NUS3D and 3DVA datasets demonstrates substantial improvements.
arXiv Detail & Related papers (2026-02-06T06:15:38Z) - Joint Geometry-Appearance Human Reconstruction in a Unified Latent Space via Bridge Diffusion [57.09673862519791]
This paper introduces textbfJGA-LBD, a novel framework that unifies the modeling of geometry and appearance into a joint latent representation.<n> Experiments demonstrate that JGA-LBD outperforms current state-of-the-art approaches in terms of both geometry fidelity and appearance quality.
arXiv Detail & Related papers (2026-01-01T12:48:56Z) - IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction [82.53307702809606]
Humans naturally perceive the geometric structure and semantic content of a 3D world as intertwined dimensions.<n>We propose InstanceGrounded Geometry Transformer (IGGT) to unify the knowledge for both spatial reconstruction and instance-level contextual understanding.
arXiv Detail & Related papers (2025-10-26T14:57:44Z) - OracleGS: Grounding Generative Priors for Sparse-View Gaussian Splatting [78.70702961852119]
OracleGS reconciles generative completeness with regressive fidelity for sparse view Gaussian Splatting.<n>Our approach conditions the powerful generative prior on multi-view geometric evidence, filtering hallucinatory artifacts while preserving plausible completions in under-constrained regions.
arXiv Detail & Related papers (2025-09-27T11:19:32Z) - Seeing 3D Through 2D Lenses: 3D Few-Shot Class-Incremental Learning via Cross-Modal Geometric Rectification [59.17489431187807]
We propose a framework that enhances 3D geometric fidelity by leveraging CLIP's hierarchical spatial semantics.<n>Our method significantly improves 3D few-shot class-incremental learning, achieving superior geometric coherence and robustness to texture bias.
arXiv Detail & Related papers (2025-09-18T13:45:08Z) - SeqAffordSplat: Scene-level Sequential Affordance Reasoning on 3D Gaussian Splatting [85.87902260102652]
We introduce the novel task of Sequential 3D Gaussian Affordance Reasoning.<n>We then propose SeqSplatNet, an end-to-end framework that directly maps an instruction to a sequence of 3D affordance masks.<n>Our method sets a new state-of-the-art on our challenging benchmark, effectively advancing affordance reasoning from single-step interactions to complex, sequential tasks at the scene level.
arXiv Detail & Related papers (2025-07-31T17:56:55Z) - Sparse-View 3D Reconstruction: Recent Advances and Open Challenges [0.8583178253811411]
Sparse-view 3D reconstruction is essential for applications in which dense image acquisition is impractical.<n>This survey reviews the latest advances in neural implicit models and explicit point-cloud-based approaches.<n>We analyze how geometric regularization, explicit shape modeling, and generative inference are used to mitigate artifacts.
arXiv Detail & Related papers (2025-07-22T09:57:28Z) - End-to-End Fine-Tuning of 3D Texture Generation using Differentiable Rewards [8.953379216683732]
We propose an end-to-end differentiable, reinforcement-learning-free framework that embeds human feedback, expressed as differentiable reward functions, directly into the 3D texture pipeline.<n>By back-propagating preference signals through both geometric and appearance modules, our method generates textures that respect the 3D geometry structure and align with desired criteria.
arXiv Detail & Related papers (2025-06-23T06:24:12Z) - E3D-Bench: A Benchmark for End-to-End 3D Geometric Foundation Models [78.1674905950243]
We present the first comprehensive benchmark for 3D geometric foundation models (GFMs)<n>GFMs directly predict dense 3D representations in a single feed-forward pass, eliminating the need for slow or unavailable precomputed camera parameters.<n>We evaluate 16 state-of-the-art GFMs, revealing their strengths and limitations across tasks and domains.<n>All code, evaluation scripts, and processed data will be publicly released to accelerate research in 3D spatial intelligence.
arXiv Detail & Related papers (2025-06-02T17:53:09Z) - Shape from Semantics: 3D Shape Generation from Multi-View Semantics [30.969299308083723]
Existing 3D reconstruction methods utilize guidances such as 2D images, 3D point clouds, shape contours and single semantics to recover the 3D surface.<n>We propose a novel 3D modeling task called Shape from Semantics'', which aims to create 3D models whose geometry and appearance are consistent with the given text semantics when viewed from different views.
arXiv Detail & Related papers (2025-02-01T07:51:59Z) - MOSE: Monocular Semantic Reconstruction Using NeRF-Lifted Noisy Priors [11.118490283303407]
We propose a neural field semantic reconstruction approach to lift inferred image-level noisy priors to 3D.
Our method produces accurate semantics and geometry in both 3D and 2D space.
arXiv Detail & Related papers (2024-09-21T05:12:13Z) - Monocular 3D Object Reconstruction with GAN Inversion [122.96094885939146]
MeshInversion is a novel framework to improve the reconstruction of textured 3D meshes.
It exploits the generative prior of a 3D GAN pre-trained for 3D textured mesh synthesis.
Our framework obtains faithful 3D reconstructions with consistent geometry and texture across both observed and unobserved parts.
arXiv Detail & Related papers (2022-07-20T17:47:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.