CAGS: Open-Vocabulary 3D Scene Understanding with Context-Aware Gaussian Splatting
- URL: http://arxiv.org/abs/2504.11893v1
- Date: Wed, 16 Apr 2025 09:20:03 GMT
- Title: CAGS: Open-Vocabulary 3D Scene Understanding with Context-Aware Gaussian Splatting
- Authors: Wei Sun, Yanzhao Zhou, Jianbin Jiao, Yuan Li,
- Abstract summary: 3D Gaussian Splatting (3DGS) offers a powerful representation for scene reconstruction, but cross-view granularity inconsistency is a problem.<n>We propose Context-Aware Gaussian Splatting (CAGS), a novel framework that addresses this challenge by incorporating spatial context into 3DGS.<n>CAGS significantly improves 3D instance segmentation and reduces fragmentation errors on datasets like LERF-OVS and ScanNet.
- Score: 18.581169318975046
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open-vocabulary 3D scene understanding is crucial for applications requiring natural language-driven spatial interpretation, such as robotics and augmented reality. While 3D Gaussian Splatting (3DGS) offers a powerful representation for scene reconstruction, integrating it with open-vocabulary frameworks reveals a key challenge: cross-view granularity inconsistency. This issue, stemming from 2D segmentation methods like SAM, results in inconsistent object segmentations across views (e.g., a "coffee set" segmented as a single entity in one view but as "cup + coffee + spoon" in another). Existing 3DGS-based methods often rely on isolated per-Gaussian feature learning, neglecting the spatial context needed for cohesive object reasoning, leading to fragmented representations. We propose Context-Aware Gaussian Splatting (CAGS), a novel framework that addresses this challenge by incorporating spatial context into 3DGS. CAGS constructs local graphs to propagate contextual features across Gaussians, reducing noise from inconsistent granularity, employs mask-centric contrastive learning to smooth SAM-derived features across views, and leverages a precomputation strategy to reduce computational cost by precomputing neighborhood relationships, enabling efficient training in large-scale scenes. By integrating spatial context, CAGS significantly improves 3D instance segmentation and reduces fragmentation errors on datasets like LERF-OVS and ScanNet, enabling robust language-guided 3D scene understanding.
Related papers
- Training-Free Hierarchical Scene Understanding for Gaussian Splatting with Superpoint Graphs [16.153129392697885]
We introduce a training-free framework that constructs a superpoint graph directly from Gaussian primitives.
The superpoint graph partitions the scene into spatially compact and semantically coherent regions, forming view-consistent 3D entities.
Our method achieves state-of-the-art open-vocabulary segmentation performance, with semantic field reconstruction completed over $30times$ faster.
arXiv Detail & Related papers (2025-04-17T17:56:07Z) - Segment then Splat: A Unified Approach for 3D Open-Vocabulary Segmentation based on Gaussian Splatting [11.186317340623807]
Open-vocabulary querying in 3D space is crucial for enabling more intelligent perception in applications such as robotics, autonomous systems, and augmented reality.<n>Most existing methods rely on 2D pixel-level parsing, leading to multi-view inconsistencies and poor 3D object retrieval.<n>We propose Segment then, a 3D-aware open vocabulary segmentation approach for both static and dynamic scenes.
arXiv Detail & Related papers (2025-03-28T07:36:51Z) - COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting [67.03992455145325]
3D segmentation based on 3D Gaussian Splatting (3DGS) struggles with accurately delineating object boundaries.<n>We introduce Clear Object Boundaries for 3DGS (COB-GS), which aims to improve segmentation accuracy.<n>For semantic guidance, we introduce a boundary-adaptive Gaussian splitting technique.<n>For the visual optimization, we rectify the degraded texture of the 3DGS scene.
arXiv Detail & Related papers (2025-03-25T08:31:43Z) - GaussianGraph: 3D Gaussian-based Scene Graph Generation for Open-world Scene Understanding [20.578106363482018]
We propose a novel framework that enhances 3DGS-based scene understanding by integrating semantic clustering and scene graph generation.<n>We introduce a "Control-Follow" clustering strategy, which dynamically adapts to scene scale and feature distribution, avoiding feature compression.<n>We enrich scene representation by integrating object attributes and spatial relations extracted from 2D foundation models.
arXiv Detail & Related papers (2025-03-06T02:36:59Z) - OVGaussian: Generalizable 3D Gaussian Segmentation with Open Vocabularies [112.80292725951921]
textbfOVGaussian is a generalizable textbfOpen-textbfVocabulary 3D semantic segmentation framework based on the 3D textbfGaussian representation.<n>We first construct a large-scale 3D scene dataset based on 3DGS, dubbed textbfSegGaussian, which provides detailed semantic and instance annotations for both Gaussian points and multi-view images.<n>To promote semantic generalization across scenes, we introduce Generalizable Semantic Rasterization (GSR), which leverages a
arXiv Detail & Related papers (2024-12-31T07:55:35Z) - SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians [77.77265204740037]
3D Gaussian Splatting has recently gained traction for its efficient training and real-time rendering.<n>We introduce SuperGSeg, a novel approach that fosters cohesive, context-aware scene representation.<n>SuperGSeg outperforms prior works on both open-vocabulary object localization and semantic segmentation tasks.
arXiv Detail & Related papers (2024-12-13T16:01:19Z) - Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding [59.51535163599723]
FreeGS is an unsupervised semantic-embedded 3DGS framework that achieves view-consistent 3D scene understanding without the need for 2D labels.<n>We show that FreeGS performs comparably to state-of-the-art methods while avoiding the complex data preprocessing workload.
arXiv Detail & Related papers (2024-11-29T08:52:32Z) - InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception [17.530797215534456]
3D scene understanding has become an essential area of research with applications in autonomous driving, robotics, and augmented reality.<n>We propose InstanceGaussian, a method that jointly learns appearance and semantic features while adaptively aggregating instances.<n>Our approach achieves state-of-the-art performance in category-agnostic, open-vocabulary 3D point-level segmentation.
arXiv Detail & Related papers (2024-11-28T16:08:36Z) - GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane [53.388937705785025]
3D open-vocabulary scene understanding is crucial for advancing augmented reality and robotic applications.
We introduce GOI, a framework that integrates semantic features from 2D vision-language foundation models into 3D Gaussian Splatting (3DGS)
Our method treats the feature selection process as a hyperplane division within the feature space, retaining only features that are highly relevant to the query.
arXiv Detail & Related papers (2024-05-27T18:57:18Z) - Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting [27.974762304763694]
We introduce Semantic Gaussians, a novel open-vocabulary scene understanding approach based on 3D Gaussian Splatting.
Unlike existing methods, we design a versatile projection approach that maps various 2D semantic features into a novel semantic component of 3D Gaussians.
We build a 3D semantic network that directly predicts the semantic component from raw 3D Gaussians for fast inference.
arXiv Detail & Related papers (2024-03-22T21:28:19Z) - SAGD: Boundary-Enhanced Segment Anything in 3D Gaussian via Gaussian Decomposition [66.56357905500512]
3D Gaussian Splatting has emerged as an alternative 3D representation for novel view synthesis.<n>We propose SAGD, a conceptually simple yet effective boundary-enhanced segmentation pipeline for 3D-GS.<n>Our approach achieves high-quality 3D segmentation without rough boundary issues, which can be easily applied to other scene editing tasks.
arXiv Detail & Related papers (2024-01-31T14:19:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.