SLAG: Scalable Language-Augmented Gaussian Splatting
- URL: http://arxiv.org/abs/2505.08124v1
- Date: Mon, 12 May 2025 23:32:24 GMT
- Title: SLAG: Scalable Language-Augmented Gaussian Splatting
- Authors: Laszlo Szilagyi, Francis Engelmann, Jeannette Bohg,
- Abstract summary: Language-augmented scene representations hold great promise for large-scale robotics applications such as search-and-rescue, smart cities, and mining.<n>Many of these scenarios are time-sensitive, requiring rapid scene encoding while also being data-intensive, necessitating scalable solutions.<n>We introduce SLAG, a multi-GPU framework for language-augmented Gaussian splatting that enhances the speed and scalability of embedding large scenes.
- Score: 19.643023058839603
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language-augmented scene representations hold great promise for large-scale robotics applications such as search-and-rescue, smart cities, and mining. Many of these scenarios are time-sensitive, requiring rapid scene encoding while also being data-intensive, necessitating scalable solutions. Deploying these representations on robots with limited computational resources further adds to the challenge. To address this, we introduce SLAG, a multi-GPU framework for language-augmented Gaussian splatting that enhances the speed and scalability of embedding large scenes. Our method integrates 2D visual-language model features into 3D scenes using SAM and CLIP. Unlike prior approaches, SLAG eliminates the need for a loss function to compute per-Gaussian language embeddings. Instead, it derives embeddings from 3D Gaussian scene parameters via a normalized weighted average, enabling highly parallelized scene encoding. Additionally, we introduce a vector database for efficient embedding storage and retrieval. Our experiments show that SLAG achieves an 18 times speedup in embedding computation on a 16-GPU setup compared to OpenGaussian, while preserving embedding quality on the ScanNet and LERF datasets. For more details, visit our project website: https://slag-project.github.io/.
Related papers
- GaussianVLM: Scene-centric 3D Vision-Language Models using Language-aligned Gaussian Splats for Embodied Reasoning and Beyond [56.677984098204696]
multimodal language models are driving the development of 3D Vision-Language Models (VLMs)<n>We propose a scene-centric 3D VLM for 3D Gaussian splat scenes that employs language- and task-aware scene representations.<n>We present the first Gaussian splatting-based VLM, leveraging photorealistic 3D representations derived from standard RGB images.
arXiv Detail & Related papers (2025-07-01T15:52:59Z) - FlexGS: Train Once, Deploy Everywhere with Many-in-One Flexible 3D Gaussian Splatting [57.97160965244424]
3D Gaussian splatting (3DGS) has enabled various applications in 3D scene representation and novel view synthesis.<n>Previous approaches have focused on pruning less important Gaussians, effectively compressing 3DGS.<n>We present an elastic inference method for 3DGS, achieving substantial rendering performance without additional fine-tuning.
arXiv Detail & Related papers (2025-06-04T17:17:57Z) - LODGE: Level-of-Detail Large-Scale Gaussian Splatting with Efficient Rendering [68.93333348474988]
We present a novel level-of-detail (LOD) method for 3D Gaussian Splatting on memory-constrained devices.<n>Our approach iteratively selects optimal subsets of Gaussians based on camera distance.<n>Our method achieves state-of-the-art performance on both outdoor (Hierarchical 3DGS) and indoor (Zip-NeRF) datasets.
arXiv Detail & Related papers (2025-05-29T06:50:57Z) - SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from Sparse Multi-View RGB Images [125.66499135980344]
We propose SparseGrasp, a novel open-vocabulary robotic grasping system.<n>SparseGrasp operates efficiently with sparse-view RGB images and handles scene updates fastly.<n>We show that SparseGrasp significantly outperforms state-of-the-art methods in terms of both speed and adaptability.
arXiv Detail & Related papers (2024-12-03T03:56:01Z) - Occam's LGS: An Efficient Approach for Language Gaussian Splatting [57.00354758206751]
We show that the complicated pipelines for language 3D Gaussian Splatting are simply unnecessary.<n>We apply Occam's razor to the task at hand, leading to a highly efficient weighted multi-view feature aggregation technique.
arXiv Detail & Related papers (2024-12-02T18:50:37Z) - SplatLoc: 3D Gaussian Splatting-based Visual Localization for Augmented Reality [50.179377002092416]
We propose an efficient visual localization method capable of high-quality rendering with fewer parameters.
Our method achieves superior or comparable rendering and localization performance to state-of-the-art implicit-based visual localization approaches.
arXiv Detail & Related papers (2024-09-21T08:46:16Z) - CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians [64.6687065215713]
CityGaussian employs a novel divide-and-conquer training approach and Level-of-Detail (LoD) strategy for efficient large-scale 3DGS training and rendering.
Our approach attains state-of-theart rendering quality, enabling consistent real-time rendering of largescale scenes across vastly different scales.
arXiv Detail & Related papers (2024-04-01T14:24:40Z) - Compact 3D Scene Representation via Self-Organizing Gaussian Grids [10.816451552362823]
3D Gaussian Splatting has recently emerged as a highly promising technique for modeling of static 3D scenes.
We introduce a compact scene representation organizing the parameters of 3DGS into a 2D grid with local homogeneity.
Our method achieves a reduction factor of 17x to 42x in size for complex scenes with no increase in training time.
arXiv Detail & Related papers (2023-12-19T20:18:29Z) - EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS [40.94643885302646]
3D Gaussian splatting (3D-GS) has gained popularity in novel-view scene synthesis.
It addresses the challenges of lengthy training times and slow rendering speeds associated with Radiance Neural Fields (NeRFs)
We present a technique utilizing quantized embeddings to significantly reduce per-point memory storage requirements.
arXiv Detail & Related papers (2023-12-07T18:59:55Z) - Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding [2.517953665531978]
We introduce Language Embedded 3D Gaussians, a novel scene representation for open-vocabulary query tasks.
Our representation achieves the best visual quality and language querying accuracy across current language-embedded representations.
arXiv Detail & Related papers (2023-11-30T11:50:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.