Related papers: LEGO-SLAM: Language-Embedded Gaussian Optimization SLAM

LEGO-SLAM: Language-Embedded Gaussian Optimization SLAM

URL: http://arxiv.org/abs/2511.16144v1
Date: Thu, 20 Nov 2025 08:31:34 GMT
Title: LEGO-SLAM: Language-Embedded Gaussian Optimization SLAM
Authors: Sibaek Lee, Seongbo Ha, Kyeongsu Kang, Joonyeol Choi, Seungjun Tak, Hyeonwoo Yu,
Abstract summary: We propose LEGO-SLAM, a framework to achieve real-time, open-vocabulary mapping within a 3DGS-based SLAM system.<n>At the core of our method is a scene-adaptive encoder-decoder that distills high-dimensional language embeddings into a compact 16-dimensional feature space.<n>Experiments demonstrate that LEGO-SLAM achieves competitive mapping quality and tracking accuracy, all while providing open-vocabulary capabilities at 15 FPS.
Score: 2.0524609401792397
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in 3D Gaussian Splatting (3DGS) have enabled Simultaneous Localization and Mapping (SLAM) systems to build photorealistic maps. However, these maps lack the open-vocabulary semantic understanding required for advanced robotic interaction. Integrating language features into SLAM remains a significant challenge, as storing high-dimensional features demands excessive memory and rendering overhead, while existing methods with static models lack adaptability for novel environments. To address these limitations, we propose LEGO-SLAM (Language-Embedded Gaussian Optimization SLAM), the first framework to achieve real-time, open-vocabulary mapping within a 3DGS-based SLAM system. At the core of our method is a scene-adaptive encoder-decoder that distills high-dimensional language embeddings into a compact 16-dimensional feature space. This design reduces the memory per Gaussian and accelerates rendering, enabling real-time performance. Unlike static approaches, our encoder adapts online to unseen scenes. These compact features also enable a language-guided pruning strategy that identifies semantic redundancy, reducing the map's Gaussian count by over 60\% while maintaining rendering quality. Furthermore, we introduce a language-based loop detection approach that reuses these mapping features, eliminating the need for a separate detection model. Extensive experiments demonstrate that LEGO-SLAM achieves competitive mapping quality and tracking accuracy, all while providing open-vocabulary capabilities at 15 FPS.

Related papers

LangGS-SLAM: Real-Time Language-Feature Gaussian Splatting SLAM [2.738569311610586]
RGB-D SLAM system reconstructs a language-aligned dense feature field while sustaining low-latency tracking and mapping.<n>System achieves superior geometric fidelity compared to geometric-only baselines and comparable semantic fidelity to offline approaches while operating at 15 FPS.
arXiv Detail & Related papers (2026-01-28T05:35:34Z)
Joint Semantic and Rendering Enhancements in 3D Gaussian Modeling with Anisotropic Local Encoding [86.55824709875598]
We propose a joint enhancement framework for 3D semantic Gaussian modeling that synergizes both semantic and rendering branches.<n>Unlike conventional point cloud shape encoding, we introduce an anisotropic 3D Gaussian Chebyshev descriptor to capture fine-grained 3D shape details.<n>We employ a cross-scene knowledge transfer module to continuously update learned shape patterns, enabling faster convergence and robust representations.
arXiv Detail & Related papers (2026-01-05T18:33:50Z)
Quantile Rendering: Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting [52.18697134979677]
Recent advancements in computer vision have successfully extended Open-vocabulary segmentation (OVS) to the 3D domain by leveraging 3D Gaussian Splatting (3D-GS)<n>Existing methods employ codebooks or feature compression, causing information loss, thereby degrading segmentation quality.<n>We introduce Quantile Rendering (Q-Render), a novel rendering strategy for 3D Gaussians that efficiently handles high-dimensional features while maintaining high fidelity.<n>Our framework outperforms state-of-the-art methods, while enabling real-time rendering with an approximate 43.7x speedup on 512-D feature maps.
arXiv Detail & Related papers (2025-12-24T04:16:18Z)
Gen-LangSplat: Generalized Language Gaussian Splatting with Pre-Trained Feature Compression [0.0]
We introduce Gen-LangSplat, that replaces the scene-wise autoencoder with a generalized autoencoder, pre-trained extensively on the large-scale ScanNet dataset.<n>This architectural shift enables the use of a fixed, compact latent space for language features across any new scene without any scene-specific training.<n>Our results demonstrate that generalized embeddings can efficiently and accurately support open-vocabulary querying in novel 3D scenes.
arXiv Detail & Related papers (2025-10-27T02:13:38Z)
GaussianVLM: Scene-centric 3D Vision-Language Models using Language-aligned Gaussian Splats for Embodied Reasoning and Beyond [56.677984098204696]
multimodal language models are driving the development of 3D Vision-Language Models (VLMs)<n>We propose a scene-centric 3D VLM for 3D Gaussian splat scenes that employs language- and task-aware scene representations.<n>We present the first Gaussian splatting-based VLM, leveraging photorealistic 3D representations derived from standard RGB images.
arXiv Detail & Related papers (2025-07-01T15:52:59Z)
LODGE: Level-of-Detail Large-Scale Gaussian Splatting with Efficient Rendering [75.67501939005119]
We present a novel level-of-detail (LOD) method for 3D Gaussian Splatting on memory-constrained devices.<n>Our approach iteratively selects optimal subsets of Gaussians based on camera distance.<n>Our method achieves state-of-the-art performance on both outdoor (Hierarchical 3DGS) and indoor (Zip-NeRF) datasets.
arXiv Detail & Related papers (2025-05-29T06:50:57Z)
GSFF-SLAM: 3D Semantic Gaussian Splatting SLAM via Feature Field [17.57215792490409]
GSFF-SLAM is a novel dense semantic SLAM system based on 3D Gaussian Splatting.<n>Our method supports semantic reconstruction using various forms of 2D priors, particularly sparse and noisy signals.<n>When utilizing 2D ground truth priors, GSFF-SLAM achieves state-of-the-art semantic segmentation performance with 95.03% mIoU.
arXiv Detail & Related papers (2025-04-28T01:21:35Z)
Online Language Splatting [28.066910888210973]
We introduce Online Language Splatting, the first framework to achieve online, near real-time, open-vocabulary language mapping within a 3DGS-SLAM system.<n>We show that our online method surpasses the state-of-the-art offline methods in accuracy and achieves more than 40x efficiency boost.
arXiv Detail & Related papers (2025-03-12T14:49:24Z)
SplatLoc: 3D Gaussian Splatting-based Visual Localization for Augmented Reality [50.179377002092416]
We propose an efficient visual localization method capable of high-quality rendering with fewer parameters. Our method achieves superior or comparable rendering and localization performance to state-of-the-art implicit-based visual localization approaches.
arXiv Detail & Related papers (2024-09-21T08:46:16Z)
Hier-SLAM: Scaling-up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting [28.821276113559346]
We propose Hier-SLAM, a semantic 3D Gaussian Splatting SLAM method featuring a novel hierarchical categorical representation.<n>Our MethodName outperforms existing dense SLAM methods in both mapping and tracking accuracy, while achieving a 2x operation speed-up.<n>It showcases the capability of handling the complex real-world scene with more than 500 semantic classes, highlighting its valuable scaling-up capability.
arXiv Detail & Related papers (2024-09-19T07:18:41Z)
GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane [53.388937705785025]
3D open-vocabulary scene understanding is crucial for advancing augmented reality and robotic applications. We introduce GOI, a framework that integrates semantic features from 2D vision-language foundation models into 3D Gaussian Splatting (3DGS) Our method treats the feature selection process as a hyperplane division within the feature space, retaining only features that are highly relevant to the query.
arXiv Detail & Related papers (2024-05-27T18:57:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.