Related papers: SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians

SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians

URL: http://arxiv.org/abs/2412.10231v1
Date: Fri, 13 Dec 2024 16:01:19 GMT
Title: SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians
Authors: Siyun Liang, Sen Wang, Kunyi Li, Michael Niemeyer, Stefano Gasperini, Nassir Navab, Federico Tombari,
Abstract summary: 3D Gaussian Splatting has recently gained traction for its efficient training and real-time rendering.<n>We introduce SuperGSeg, a novel approach that fosters cohesive, context-aware scene representation.<n>SuperGSeg outperforms prior works on both open-vocabulary object localization and semantic segmentation tasks.
Score: 77.77265204740037
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: 3D Gaussian Splatting has recently gained traction for its efficient training and real-time rendering. While the vanilla Gaussian Splatting representation is mainly designed for view synthesis, more recent works investigated how to extend it with scene understanding and language features. However, existing methods lack a detailed comprehension of scenes, limiting their ability to segment and interpret complex structures. To this end, We introduce SuperGSeg, a novel approach that fosters cohesive, context-aware scene representation by disentangling segmentation and language field distillation. SuperGSeg first employs neural Gaussians to learn instance and hierarchical segmentation features from multi-view images with the aid of off-the-shelf 2D masks. These features are then leveraged to create a sparse set of what we call Super-Gaussians. Super-Gaussians facilitate the distillation of 2D language features into 3D space. Through Super-Gaussians, our method enables high-dimensional language feature rendering without extreme increases in GPU memory. Extensive experiments demonstrate that SuperGSeg outperforms prior works on both open-vocabulary object localization and semantic segmentation tasks.

Related papers

C3G: Learning Compact 3D Representations with 2K Gaussians [55.04010158339562]
Recent approaches use per-pixel 3D Gaussian Splatting for reconstruction, followed by a 2D-to-3D feature lifting stage for scene understanding.<n>We propose C3G, a novel feed-forward framework that estimates compact 3D Gaussians only at essential spatial locations.
arXiv Detail & Related papers (2025-12-03T17:59:05Z)
OpenInsGaussian: Open-vocabulary Instance Gaussian Segmentation with Context-aware Cross-view Fusion [89.98812408058336]
We introduce textbfOpenInsGaussian, an textbfOpen-vocabulary textbfInstance textbfGaussian segmentation framework with Context-aware Cross-view Fusion.<n>OpenInsGaussian achieves state-of-the-art results in open-vocabulary 3D Gaussian segmentation, outperforming existing baselines by a large margin.
arXiv Detail & Related papers (2025-10-21T03:24:12Z)
GALA: Guided Attention with Language Alignment for Open Vocabulary Gaussian Splatting [74.56128224977279]
We present GALA, a novel framework for open-vocabulary 3D scene understanding with 3D Gaussian Splatting (3DGS)<n>GALA distills a scene-specific 3D instance feature field via self-supervised contrastive learning.<n>It supports seamless 2D and 3D open-vocabulary queries and reduces memory consumption by avoiding per-Gaussian high-dimensional feature learning.
arXiv Detail & Related papers (2025-08-19T21:26:49Z)
OpenSplat3D: Open-Vocabulary 3D Instance Segmentation using Gaussian Splatting [52.40697058096931]
3D Gaussian Splatting (3DGS) has emerged as a powerful representation for neural scene reconstruction.<n>We introduce an approach for open-vocabulary 3D instance segmentation without requiring manual labeling, termed OpenSplat3D.<n>We show results on LERF-mask and LERF-OVS as well as the full ScanNet++ validation set, demonstrating the effectiveness of our approach.
arXiv Detail & Related papers (2025-06-09T12:37:15Z)
Training-Free Hierarchical Scene Understanding for Gaussian Splatting with Superpoint Graphs [16.153129392697885]
We introduce a training-free framework that constructs a superpoint graph directly from Gaussian primitives. The superpoint graph partitions the scene into spatially compact and semantically coherent regions, forming view-consistent 3D entities. Our method achieves state-of-the-art open-vocabulary segmentation performance, with semantic field reconstruction completed over $30times$ faster.
arXiv Detail & Related papers (2025-04-17T17:56:07Z)
CAGS: Open-Vocabulary 3D Scene Understanding with Context-Aware Gaussian Splatting [18.581169318975046]
3D Gaussian Splatting (3DGS) offers a powerful representation for scene reconstruction, but cross-view granularity inconsistency is a problem. We propose Context-Aware Gaussian Splatting (CAGS), a novel framework that addresses this challenge by incorporating spatial context into 3DGS. CAGS significantly improves 3D instance segmentation and reduces fragmentation errors on datasets like LERF-OVS and ScanNet.
arXiv Detail & Related papers (2025-04-16T09:20:03Z)
OVGaussian: Generalizable 3D Gaussian Segmentation with Open Vocabularies [112.80292725951921]
textbfOVGaussian is a generalizable textbfOpen-textbfVocabulary 3D semantic segmentation framework based on the 3D textbfGaussian representation. We first construct a large-scale 3D scene dataset based on 3DGS, dubbed textbfSegGaussian, which provides detailed semantic and instance annotations for both Gaussian points and multi-view images. To promote semantic generalization across scenes, we introduce Generalizable Semantic Rasterization (GSR), which leverages a
arXiv Detail & Related papers (2024-12-31T07:55:35Z)
LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding [42.750252190275546]
LangSurf is a language-Embedded Surface Field that aligns 3D language fields with the surface of objects. Our method is capable of segmenting objects in 3D space, thus boosting the effectiveness of our approach in instance recognition, removal, and editing.
arXiv Detail & Related papers (2024-12-23T15:12:20Z)
Occam's LGS: A Simple Approach for Language Gaussian Splatting [57.00354758206751]
We show that sophisticated techniques for language-grounded 3D Gaussian Splatting are simply unnecessary.<n>We apply Occam's razor to the task at hand and perform weighted multi-view feature aggregation.<n>Our results offer us state-of-the-art results with a speed-up of two orders of magnitude.
arXiv Detail & Related papers (2024-12-02T18:50:37Z)
GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane [53.388937705785025]
3D open-vocabulary scene understanding is crucial for advancing augmented reality and robotic applications. We introduce GOI, a framework that integrates semantic features from 2D vision-language foundation models into 3D Gaussian Splatting (3DGS) Our method treats the feature selection process as a hyperplane division within the feature space, retaining only features that are highly relevant to the query.
arXiv Detail & Related papers (2024-05-27T18:57:18Z)
Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting [27.974762304763694]
We introduce Semantic Gaussians, a novel open-vocabulary scene understanding approach based on 3D Gaussian Splatting. Unlike existing methods, we design a versatile projection approach that maps various 2D semantic features into a novel semantic component of 3D Gaussians. We build a 3D semantic network that directly predicts the semantic component from raw 3D Gaussians for fast inference.
arXiv Detail & Related papers (2024-03-22T21:28:19Z)
GS-CLIP: Gaussian Splatting for Contrastive Language-Image-3D Pretraining from Real-World Data [73.06536202251915]
3D Shape represented as point cloud has achieve advancements in multimodal pre-training to align image and language descriptions. We propose GS-CLIP for the first attempt to introduce 3DGS into multimodal pre-training to enhance 3D representation.
arXiv Detail & Related papers (2024-02-09T05:46:47Z)
SAGD: Boundary-Enhanced Segment Anything in 3D Gaussian via Gaussian Decomposition [66.80822249039235]
3D Gaussian Splatting has emerged as an alternative 3D representation for novel view synthesis. We propose SAGD, a conceptually simple yet effective boundary-enhanced segmentation pipeline for 3D-GS. Our approach achieves high-quality 3D segmentation without rough boundary issues, which can be easily applied to other scene editing tasks.
arXiv Detail & Related papers (2024-01-31T14:19:03Z)
Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding [2.517953665531978]
We introduce Language Embedded 3D Gaussians, a novel scene representation for open-vocabulary query tasks. Our representation achieves the best visual quality and language querying accuracy across current language-embedded representations.
arXiv Detail & Related papers (2023-11-30T11:50:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.