Related papers: Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting

Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting

URL: http://arxiv.org/abs/2403.15624v1
Date: Fri, 22 Mar 2024 21:28:19 GMT
Title: Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting
Authors: Jun Guo, Xiaojian Ma, Yue Fan, Huaping Liu, Qing Li,
Abstract summary: Open-vocabulary 3D scene understanding presents a significant challenge in computer vision. We introduce SemanticGaussians, a novel open-vocabulary scene understanding approach based on 3D Gaussian Splatting. Our approach attains a 4.2% mIoU and 4.0%mAcc improvement over prior open-vocabulary scene understanding counterparts.
Score: 27.974762304763694
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Open-vocabulary 3D scene understanding presents a significant challenge in computer vision, withwide-ranging applications in embodied agents and augmented reality systems. Previous approaches haveadopted Neural Radiance Fields (NeRFs) to analyze 3D scenes. In this paper, we introduce SemanticGaussians, a novel open-vocabulary scene understanding approach based on 3D Gaussian Splatting. Our keyidea is distilling pre-trained 2D semantics into 3D Gaussians. We design a versatile projection approachthat maps various 2Dsemantic features from pre-trained image encoders into a novel semantic component of 3D Gaussians, withoutthe additional training required by NeRFs. We further build a 3D semantic network that directly predictsthe semantic component from raw 3D Gaussians for fast inference. We explore several applications ofSemantic Gaussians: semantic segmentation on ScanNet-20, where our approach attains a 4.2% mIoU and 4.0%mAcc improvement over prior open-vocabulary scene understanding counterparts; object part segmentation,sceneediting, and spatial-temporal segmentation with better qualitative results over 2D and 3D baselines,highlighting its versatility and effectiveness on supporting diverse downstream tasks.

Related papers

Gaussian2Scene: 3D Scene Representation Learning via Self-supervised Learning with 3D Gaussian Splatting [6.678115792482272]
Self-supervised learning (SSL) for point cloud pre-training has become a cornerstone for many 3D vision tasks.<n>We propose a novel scene-level SSL framework that leverages the efficiency and explicit nature of 3D Gaussian Splatting (3DGS) for pre-training.
arXiv Detail & Related papers (2025-06-10T13:19:21Z)
Tackling View-Dependent Semantics in 3D Language Gaussian Splatting [80.88015191411714]
LaGa establishes cross-view semantic connections by decomposing the 3D scene into objects.<n>It constructs view-aggregated semantic representations by clustering semantic descriptors and reweighting them based on multi-view semantics.<n>Under the same settings, LaGa achieves a significant improvement of +18.7% mIoU over the previous SOTA on the LERF-OVS dataset.
arXiv Detail & Related papers (2025-05-30T16:06:32Z)
EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis [61.1662426227688]
Existing NeRF and 3DGS-based methods show promising results in achieving photorealistic renderings but require slow, per-scene optimization. We introduce EVolSplat, an efficient 3D Gaussian Splatting model for urban scenes that works in a feed-forward manner.
arXiv Detail & Related papers (2025-03-26T02:47:27Z)
Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration [41.046653227409564]
Dr. Splat is a novel approach for open-vocabulary 3D scene understanding leveraging 3D Gaussian Splatting. Our method associates language-aligned CLIP embeddings with 3D Gaussians for holistic 3D scene understanding. Experiments demonstrate that our approach significantly outperforms existing approaches in 3D perception benchmarks.
arXiv Detail & Related papers (2025-02-23T17:01:14Z)
GaussRender: Learning 3D Occupancy with Gaussian Rendering [86.89653628311565]
GaussRender is a module that improves 3D occupancy learning by enforcing projective consistency.<n>Our method penalizes 3D configurations that produce inconsistent 2D projections, thereby enforcing a more coherent 3D structure.
arXiv Detail & Related papers (2025-02-07T16:07:51Z)
PanoSLAM: Panoptic 3D Scene Reconstruction via Gaussian SLAM [105.01907579424362]
PanoSLAM is the first SLAM system to integrate geometric reconstruction, 3D semantic segmentation, and 3D instance segmentation within a unified framework. For the first time, it achieves panoptic 3D reconstruction of open-world environments directly from the RGB-D video.
arXiv Detail & Related papers (2024-12-31T08:58:10Z)
OVGaussian: Generalizable 3D Gaussian Segmentation with Open Vocabularies [112.80292725951921]
textbfOVGaussian is a generalizable textbfOpen-textbfVocabulary 3D semantic segmentation framework based on the 3D textbfGaussian representation. We first construct a large-scale 3D scene dataset based on 3DGS, dubbed textbfSegGaussian, which provides detailed semantic and instance annotations for both Gaussian points and multi-view images. To promote semantic generalization across scenes, we introduce Generalizable Semantic Rasterization (GSR), which leverages a
arXiv Detail & Related papers (2024-12-31T07:55:35Z)
GSemSplat: Generalizable Semantic 3D Gaussian Splatting from Uncalibrated Image Pairs [33.74118487769923]
We introduce GSemSplat, a framework that learns semantic representations linked to 3D Gaussians without per-scene optimization, dense image collections or calibration. We employ a dual-feature approach that leverages both region-specific and context-aware semantic features as supervision in the 2D space.
arXiv Detail & Related papers (2024-12-22T09:06:58Z)
Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding [59.51535163599723]
FreeGS is an unsupervised semantic-embedded 3DGS framework that achieves view-consistent 3D scene understanding without the need for 2D labels. We show that FreeGS performs comparably to state-of-the-art methods while avoiding the complex data preprocessing workload.
arXiv Detail & Related papers (2024-11-29T08:52:32Z)
Learning Part-aware 3D Representations by Fusing 2D Gaussians and Superquadrics [16.446659867133977]
Low-level 3D representations, such as point clouds, meshes, NeRFs, and 3D Gaussians, are commonly used to represent 3D objects or scenes. We aim to solve part-aware 3D reconstruction, which parses objects or scenes into semantic parts.
arXiv Detail & Related papers (2024-08-20T12:30:37Z)
GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane [53.388937705785025]
3D open-vocabulary scene understanding is crucial for advancing augmented reality and robotic applications. We introduce GOI, a framework that integrates semantic features from 2D vision-language foundation models into 3D Gaussian Splatting (3DGS) Our method treats the feature selection process as a hyperplane division within the feature space, retaining only features that are highly relevant to the query.
arXiv Detail & Related papers (2024-05-27T18:57:18Z)
GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction [70.65250036489128]
3D semantic occupancy prediction aims to obtain 3D fine-grained geometry and semantics of the surrounding scene. We propose an object-centric representation to describe 3D scenes with sparse 3D semantic Gaussians. GaussianFormer achieves comparable performance with state-of-the-art methods with only 17.8% - 24.8% of their memory consumption.
arXiv Detail & Related papers (2024-05-27T17:59:51Z)
CLIP-GS: CLIP-Informed Gaussian Splatting for Real-time and View-consistent 3D Semantic Understanding [32.76277160013881]
We present CLIP-GS, which integrates semantics from Contrastive Language-Image Pre-Training (CLIP) into Gaussian Splatting. SAC exploits the inherent unified semantics within objects to learn compact yet effective semantic representations of 3D Gaussians. We also introduce a 3D Coherent Self-training (3DCS) strategy, resorting to the multi-view consistency originated from the 3D model.
arXiv Detail & Related papers (2024-04-22T15:01:32Z)
latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction [48.86083272054711]
latentSplat is a method to predict semantic Gaussians in a 3D latent space that can be splatted and decoded by a light-weight generative 2D architecture. We show that latentSplat outperforms previous works in reconstruction quality and generalization, while being fast and scalable to high-resolution data.
arXiv Detail & Related papers (2024-03-24T20:48:36Z)
HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting [53.6394928681237]
holistic understanding of urban scenes based on RGB images is a challenging yet important problem. Our main idea involves the joint optimization of geometry, appearance, semantics, and motion using a combination of static and dynamic 3D Gaussians. Our approach offers the ability to render new viewpoints in real-time, yielding 2D and 3D semantic information with high accuracy.
arXiv Detail & Related papers (2024-03-19T13:39:05Z)
SemGauss-SLAM: Dense Semantic Gaussian Splatting SLAM [14.126704753481972]
We propose SemGauss-SLAM, a dense semantic SLAM system that enables accurate 3D semantic mapping, robust camera tracking, and high-quality rendering simultaneously. We incorporate semantic feature embedding into 3D Gaussian representation, which effectively encodes semantic information within the spatial layout of the environment. To reduce cumulative drift in tracking and improve semantic reconstruction accuracy, we introduce semantic-informed bundle adjustment.
arXiv Detail & Related papers (2024-03-12T10:33:26Z)
SAGD: Boundary-Enhanced Segment Anything in 3D Gaussian via Gaussian Decomposition [66.80822249039235]
3D Gaussian Splatting has emerged as an alternative 3D representation for novel view synthesis. We propose SAGD, a conceptually simple yet effective boundary-enhanced segmentation pipeline for 3D-GS. Our approach achieves high-quality 3D segmentation without rough boundary issues, which can be easily applied to other scene editing tasks.
arXiv Detail & Related papers (2024-01-31T14:19:03Z)
FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding [11.118857208538039]
We present Foundation Model Embedded Gaussian Splatting (S), which incorporates vision-language embeddings of foundation models into 3D Gaussian Splatting (GS) Results demonstrate remarkable multi-view semantic consistency, facilitating diverse downstream tasks, beating state-of-the-art methods by 10.2 percent on open-vocabulary language-based object detection. This research explores the intersection of vision, language, and 3D scene representation, paving the way for enhanced scene understanding in uncontrolled real-world environments.
arXiv Detail & Related papers (2024-01-03T20:39:02Z)
SSR-2D: Semantic 3D Scene Reconstruction from 2D Images [54.46126685716471]
In this work, we explore a central 3D scene modeling task, namely, semantic scene reconstruction without using any 3D annotations. The key idea of our approach is to design a trainable model that employs both incomplete 3D reconstructions and their corresponding source RGB-D images. Our method achieves the state-of-the-art performance of semantic scene completion on two large-scale benchmark datasets MatterPort3D and ScanNet.
arXiv Detail & Related papers (2023-02-07T17:47:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.