Related papers: Tackling View-Dependent Semantics in 3D Language Gaussian Splatting

Tackling View-Dependent Semantics in 3D Language Gaussian Splatting

URL: http://arxiv.org/abs/2505.24746v1
Date: Fri, 30 May 2025 16:06:32 GMT
Title: Tackling View-Dependent Semantics in 3D Language Gaussian Splatting
Authors: Jiazhong Cen, Xudong Zhou, Jiemin Fang, Changsong Wen, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian,
Abstract summary: LaGa establishes cross-view semantic connections by decomposing the 3D scene into objects.<n>It constructs view-aggregated semantic representations by clustering semantic descriptors and reweighting them based on multi-view semantics.<n>Under the same settings, LaGa achieves a significant improvement of +18.7% mIoU over the previous SOTA on the LERF-OVS dataset.
Score: 80.88015191411714
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advancements in 3D Gaussian Splatting (3D-GS) enable high-quality 3D scene reconstruction from RGB images. Many studies extend this paradigm for language-driven open-vocabulary scene understanding. However, most of them simply project 2D semantic features onto 3D Gaussians and overlook a fundamental gap between 2D and 3D understanding: a 3D object may exhibit various semantics from different viewpoints--a phenomenon we term view-dependent semantics. To address this challenge, we propose LaGa (Language Gaussians), which establishes cross-view semantic connections by decomposing the 3D scene into objects. Then, it constructs view-aggregated semantic representations by clustering semantic descriptors and reweighting them based on multi-view semantics. Extensive experiments demonstrate that LaGa effectively captures key information from view-dependent semantics, enabling a more comprehensive understanding of 3D scenes. Notably, under the same settings, LaGa achieves a significant improvement of +18.7% mIoU over the previous SOTA on the LERF-OVS dataset. Our code is available at: https://github.com/SJTU-DeepVisionLab/LaGa.

Related papers

Hi-LSplat: Hierarchical 3D Language Gaussian Splatting [11.810729064982372]
Hi-LSplat is a view-consistent Hierarchical Language Gaussian Splatting work for 3D open-vocabulary querying.<n>We construct two hierarchical semantic datasets to better assess the model's ability to distinguish different semantic levels.
arXiv Detail & Related papers (2025-06-07T14:56:19Z)
Training-Free Hierarchical Scene Understanding for Gaussian Splatting with Superpoint Graphs [16.153129392697885]
We introduce a training-free framework that constructs a superpoint graph directly from Gaussian primitives.<n>The superpoint graph partitions the scene into spatially compact and semantically coherent regions, forming view-consistent 3D entities.<n>Our method achieves state-of-the-art open-vocabulary segmentation performance, with semantic field reconstruction completed over $30times$ faster.
arXiv Detail & Related papers (2025-04-17T17:56:07Z)
PanoGS: Gaussian-based Panoptic Segmentation for 3D Open Vocabulary Scene Understanding [8.72555461868951]
3D Gaussian Splatting (3DGS) has shown encouraging performance for open vocabulary scene understanding tasks.<n>Previous methods cannot distinguish 3D instance-level information, which usually predicts a heatmap between the scene feature and text query.<n>We propose PanoGS, a novel and effective 3D panoptic open vocabulary scene understanding approach.
arXiv Detail & Related papers (2025-03-23T15:27:29Z)
UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting [68.37013525040891]
We propose UniGS, integrating 3D Gaussian Splatting (3DGS) into multi-modal pre-training to enhance the 3D representation.<n>We demonstrate the effectiveness of UniGS in learning a more general and stronger aligned multi-modal representation.
arXiv Detail & Related papers (2025-02-25T05:10:22Z)
GaussRender: Learning 3D Occupancy with Gaussian Rendering [86.89653628311565]
GaussRender is a module that improves 3D occupancy learning by enforcing projective consistency.<n>Our method penalizes 3D configurations that produce inconsistent 2D projections, thereby enforcing a more coherent 3D structure.
arXiv Detail & Related papers (2025-02-07T16:07:51Z)
AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring [49.78120051062641]
3D visual grounding aims to correlate a natural language description with the target object within a 3D scene.<n>Existing approaches commonly encounter a shortage of text3D pairs available for training.<n>We propose AugRefer, a novel approach for advancing 3D visual grounding.
arXiv Detail & Related papers (2025-01-16T09:57:40Z)
OVGaussian: Generalizable 3D Gaussian Segmentation with Open Vocabularies [112.80292725951921]
textbfOVGaussian is a generalizable textbfOpen-textbfVocabulary 3D semantic segmentation framework based on the 3D textbfGaussian representation.<n>We first construct a large-scale 3D scene dataset based on 3DGS, dubbed textbfSegGaussian, which provides detailed semantic and instance annotations for both Gaussian points and multi-view images.<n>To promote semantic generalization across scenes, we introduce Generalizable Semantic Rasterization (GSR), which leverages a
arXiv Detail & Related papers (2024-12-31T07:55:35Z)
Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding [59.51535163599723]
FreeGS is an unsupervised semantic-embedded 3DGS framework that achieves view-consistent 3D scene understanding without the need for 2D labels.<n>FreeGS performs comparably to state-of-the-art methods while avoiding the complex data preprocessing workload.
arXiv Detail & Related papers (2024-11-29T08:52:32Z)
Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting [27.974762304763694]
We introduce Semantic Gaussians, a novel open-vocabulary scene understanding approach based on 3D Gaussian Splatting. Unlike existing methods, we design a versatile projection approach that maps various 2D semantic features into a novel semantic component of 3D Gaussians. We build a 3D semantic network that directly predicts the semantic component from raw 3D Gaussians for fast inference.
arXiv Detail & Related papers (2024-03-22T21:28:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.