Towards Physically Executable 3D Gaussian for Embodied Navigation
- URL: http://arxiv.org/abs/2510.21307v1
- Date: Fri, 24 Oct 2025 10:05:00 GMT
- Title: Towards Physically Executable 3D Gaussian for Embodied Navigation
- Authors: Bingchen Miao, Rong Wei, Zhiqi Ge, Xiaoquan sun, Shiqi Gao, Jingzhe Zhu, Renhan Wang, Siliang Tang, Jun Xiao, Rui Tang, Juncheng Li,
- Abstract summary: SAGE-3D is a new paradigm that upgrades 3DGS into an executable, semantically and physically aligned environment.<n>It comprises two components: (1) Object-Centric Semantic Grounding, which adds object-level fine-grained annotations to 3DGS; and (2) Physics-Aware Execution Jointing, which embeds collision objects into 3DGS.<n>We release InteriorGS, containing 1K object-annotated 3DGS indoor scene data, and introduce SAGE-Bench, the first 3DGS-based VLN benchmark with 2M VLN data.
- Score: 37.428618598143395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D Gaussian Splatting (3DGS), a 3D representation method with photorealistic real-time rendering capabilities, is regarded as an effective tool for narrowing the sim-to-real gap. However, it lacks fine-grained semantics and physical executability for Visual-Language Navigation (VLN). To address this, we propose SAGE-3D (Semantically and Physically Aligned Gaussian Environments for 3D Navigation), a new paradigm that upgrades 3DGS into an executable, semantically and physically aligned environment. It comprises two components: (1) Object-Centric Semantic Grounding, which adds object-level fine-grained annotations to 3DGS; and (2) Physics-Aware Execution Jointing, which embeds collision objects into 3DGS and constructs rich physical interfaces. We release InteriorGS, containing 1K object-annotated 3DGS indoor scene data, and introduce SAGE-Bench, the first 3DGS-based VLN benchmark with 2M VLN data. Experiments show that 3DGS scene data is more difficult to converge, while exhibiting strong generalizability, improving baseline performance by 31% on the VLN-CE Unseen task. The data and code will be available soon.
Related papers
- Sparse View Distractor-Free Gaussian Splatting [31.812029183156245]
3D Gaussian Splatting (3DGS) enables efficient training and fast novel view in static environments.<n>We propose a framework to enhance distractor-free 3DGS under sparse-view conditions by incorporating rich prior information.
arXiv Detail & Related papers (2026-03-02T08:32:32Z) - LabelGS: Label-Aware 3D Gaussian Splatting for 3D Scene Segmentation [56.4321049923868]
3D Gaussian Splatting (3DGS) has emerged as a novel explicit representation for 3D scenes, offering both high-fidelity reconstruction and efficient rendering.<n>We propose Label-aware 3D Gaussian Splatting (LabelGS), a method that augments the Gaussian representation with object label.<n>LabelGS achieves a remarkable 22X speedup in training compared to Feature-3DGS, at a resolution of 1440X1080.
arXiv Detail & Related papers (2025-08-27T09:07:38Z) - SAGOnline: Segment Any Gaussians Online [17.33447710659887]
3D Gaussian Splatting (3DGS) has emerged as a powerful paradigm for explicit 3D scene representation.<n>Current methods suffer from prohibitive computational costs, limited 3D spatial reasoning, and an inability to track multiple objects simultaneously.<n>We present Segment Any Gaussians Online (SAGOnline), a lightweight and zero-shot framework for real-time 3D segmentation in Gaussian scenes.
arXiv Detail & Related papers (2025-08-11T17:38:50Z) - SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding [10.81711535075112]
3D Visual Grounding aims to locate objects in 3D scenes based on textual descriptions, essential for applications like augmented reality and robotics.<n>We introduce SeeGround, a zero-shot 3DVG framework leveraging 2D Vision-Language Models (VLMs) trained on large-scale 2D data.<n>SeeGround represents 3D scenes as a hybrid of query-aligned rendered images and spatially enriched text descriptions, bridging the gap between 3D data and 2D-VLMs input formats.
arXiv Detail & Related papers (2024-12-05T17:58:43Z) - Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding [59.51535163599723]
FreeGS is an unsupervised semantic-embedded 3DGS framework that achieves view-consistent 3D scene understanding without the need for 2D labels.<n>FreeGS performs comparably to state-of-the-art methods while avoiding the complex data preprocessing workload.
arXiv Detail & Related papers (2024-11-29T08:52:32Z) - WildGaussians: 3D Gaussian Splatting in the Wild [80.5209105383932]
We introduce WildGaussians, a novel approach to handle occlusions and appearance changes with 3DGS.
We demonstrate that WildGaussians matches the real-time rendering speed of 3DGS while surpassing both 3DGS and NeRF baselines in handling in-the-wild data.
arXiv Detail & Related papers (2024-07-11T12:41:32Z) - HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting [53.6394928681237]
holistic understanding of urban scenes based on RGB images is a challenging yet important problem.
Our main idea involves the joint optimization of geometry, appearance, semantics, and motion using a combination of static and dynamic 3D Gaussians.
Our approach offers the ability to render new viewpoints in real-time, yielding 2D and 3D semantic information with high accuracy.
arXiv Detail & Related papers (2024-03-19T13:39:05Z) - SAGD: Boundary-Enhanced Segment Anything in 3D Gaussian via Gaussian Decomposition [66.56357905500512]
3D Gaussian Splatting has emerged as an alternative 3D representation for novel view synthesis.<n>We propose SAGD, a conceptually simple yet effective boundary-enhanced segmentation pipeline for 3D-GS.<n>Our approach achieves high-quality 3D segmentation without rough boundary issues, which can be easily applied to other scene editing tasks.
arXiv Detail & Related papers (2024-01-31T14:19:03Z) - Generating Visual Spatial Description via Holistic 3D Scene
Understanding [88.99773815159345]
Visual spatial description (VSD) aims to generate texts that describe the spatial relations of the given objects within images.
With an external 3D scene extractor, we obtain the 3D objects and scene features for input images.
We construct a target object-centered 3D spatial scene graph (Go3D-S2G), such that we model the spatial semantics of target objects within the holistic 3D scenes.
arXiv Detail & Related papers (2023-05-19T15:53:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.