Training-Free Hierarchical Scene Understanding for Gaussian Splatting with Superpoint Graphs
- URL: http://arxiv.org/abs/2504.13153v1
- Date: Thu, 17 Apr 2025 17:56:07 GMT
- Title: Training-Free Hierarchical Scene Understanding for Gaussian Splatting with Superpoint Graphs
- Authors: Shaohui Dai, Yansong Qu, Zheyan Li, Xinyang Li, Shengchuan Zhang, Liujuan Cao,
- Abstract summary: We introduce a training-free framework that constructs a superpoint graph directly from Gaussian primitives.<n>The superpoint graph partitions the scene into spatially compact and semantically coherent regions, forming view-consistent 3D entities.<n>Our method achieves state-of-the-art open-vocabulary segmentation performance, with semantic field reconstruction completed over $30times$ faster.
- Score: 16.153129392697885
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bridging natural language and 3D geometry is a crucial step toward flexible, language-driven scene understanding. While recent advances in 3D Gaussian Splatting (3DGS) have enabled fast and high-quality scene reconstruction, research has also explored incorporating open-vocabulary understanding into 3DGS. However, most existing methods require iterative optimization over per-view 2D semantic feature maps, which not only results in inefficiencies but also leads to inconsistent 3D semantics across views. To address these limitations, we introduce a training-free framework that constructs a superpoint graph directly from Gaussian primitives. The superpoint graph partitions the scene into spatially compact and semantically coherent regions, forming view-consistent 3D entities and providing a structured foundation for open-vocabulary understanding. Based on the graph structure, we design an efficient reprojection strategy that lifts 2D semantic features onto the superpoints, avoiding costly multi-view iterative training. The resulting representation ensures strong 3D semantic coherence and naturally supports hierarchical understanding, enabling both coarse- and fine-grained open-vocabulary perception within a unified semantic field. Extensive experiments demonstrate that our method achieves state-of-the-art open-vocabulary segmentation performance, with semantic field reconstruction completed over $30\times$ faster. Our code will be available at https://github.com/Atrovast/THGS.
Related papers
- Interpretable Single-View 3D Gaussian Splatting using Unsupervised Hierarchical Disentangled Representation Learning [46.85417907244265]
We propose an interpretable single-view 3DGS framework, termed 3DisGS, to discover both coarse- and fine-grained 3D semantics.<n>Our model achieves 3D disentanglement while preserving high-quality and rapid reconstruction.
arXiv Detail & Related papers (2025-04-05T14:42:13Z) - PanoGS: Gaussian-based Panoptic Segmentation for 3D Open Vocabulary Scene Understanding [8.72555461868951]
3D Gaussian Splatting (3DGS) has shown encouraging performance for open vocabulary scene understanding tasks.<n>Previous methods cannot distinguish 3D instance-level information, which usually predicts a heatmap between the scene feature and text query.<n>We propose PanoGS, a novel and effective 3D panoptic open vocabulary scene understanding approach.
arXiv Detail & Related papers (2025-03-23T15:27:29Z) - OpenGS-SLAM: Open-Set Dense Semantic SLAM with 3D Gaussian Splatting for Object-Level Scene Understanding [20.578106363482018]
OpenGS-SLAM is an innovative framework that utilizes 3D Gaussian representation to perform dense semantic SLAM in open-set environments.<n>Our system integrates explicit semantic labels derived from 2D models into the 3D Gaussian framework, facilitating robust 3D object-level understanding.<n>Our method achieves 10 times faster semantic rendering and 2 times lower storage costs compared to existing methods.
arXiv Detail & Related papers (2025-03-03T15:23:21Z) - OVGaussian: Generalizable 3D Gaussian Segmentation with Open Vocabularies [112.80292725951921]
textbfOVGaussian is a generalizable textbfOpen-textbfVocabulary 3D semantic segmentation framework based on the 3D textbfGaussian representation.<n>We first construct a large-scale 3D scene dataset based on 3DGS, dubbed textbfSegGaussian, which provides detailed semantic and instance annotations for both Gaussian points and multi-view images.<n>To promote semantic generalization across scenes, we introduce Generalizable Semantic Rasterization (GSR), which leverages a
arXiv Detail & Related papers (2024-12-31T07:55:35Z) - SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians [77.77265204740037]
3D Gaussian Splatting has recently gained traction for its efficient training and real-time rendering.<n>We introduce SuperGSeg, a novel approach that fosters cohesive, context-aware scene representation.<n>SuperGSeg outperforms prior works on both open-vocabulary object localization and semantic segmentation tasks.
arXiv Detail & Related papers (2024-12-13T16:01:19Z) - Occam's LGS: An Efficient Approach for Language Gaussian Splatting [57.00354758206751]
We show that the complicated pipelines for language 3D Gaussian Splatting are simply unnecessary.<n>We apply Occam's razor to the task at hand, leading to a highly efficient weighted multi-view feature aggregation technique.
arXiv Detail & Related papers (2024-12-02T18:50:37Z) - Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding [59.51535163599723]
FreeGS is an unsupervised semantic-embedded 3DGS framework that achieves view-consistent 3D scene understanding without the need for 2D labels.<n>We show that FreeGS performs comparably to state-of-the-art methods while avoiding the complex data preprocessing workload.
arXiv Detail & Related papers (2024-11-29T08:52:32Z) - Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting [27.974762304763694]
We introduce Semantic Gaussians, a novel open-vocabulary scene understanding approach based on 3D Gaussian Splatting.
Unlike existing methods, we design a versatile projection approach that maps various 2D semantic features into a novel semantic component of 3D Gaussians.
We build a 3D semantic network that directly predicts the semantic component from raw 3D Gaussians for fast inference.
arXiv Detail & Related papers (2024-03-22T21:28:19Z) - SAGD: Boundary-Enhanced Segment Anything in 3D Gaussian via Gaussian Decomposition [66.56357905500512]
3D Gaussian Splatting has emerged as an alternative 3D representation for novel view synthesis.
We propose SAGD, a conceptually simple yet effective boundary-enhanced segmentation pipeline for 3D-GS.
Our approach achieves high-quality 3D segmentation without rough boundary issues, which can be easily applied to other scene editing tasks.
arXiv Detail & Related papers (2024-01-31T14:19:03Z) - ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and
Planning [125.90002884194838]
ConceptGraphs is an open-vocabulary graph-structured representation for 3D scenes.
It is built by leveraging 2D foundation models and fusing their output to 3D by multi-view association.
We demonstrate the utility of this representation through a number of downstream planning tasks.
arXiv Detail & Related papers (2023-09-28T17:53:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.