Hier-SLAM: Scaling-up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting
- URL: http://arxiv.org/abs/2409.12518v3
- Date: Thu, 20 Feb 2025 04:02:40 GMT
- Title: Hier-SLAM: Scaling-up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting
- Authors: Boying Li, Zhixi Cai, Yuan-Fang Li, Ian Reid, Hamid Rezatofighi,
- Abstract summary: We propose a semantic 3D Gaussian Splatting SLAM method featuring a novel hierarchical categorical representation.
Our Hier-SLAM outperforms existing dense SLAM methods in both mapping and tracking accuracy, while achieving a 2x operation speed-up.
It showcases the capability of handling the complex real-world scene with more than 500 semantic classes, highlighting its valuable scaling-up capability.
- Score: 28.821276113559346
- License:
- Abstract: We propose Hier-SLAM, a semantic 3D Gaussian Splatting SLAM method featuring a novel hierarchical categorical representation, which enables accurate global 3D semantic mapping, scaling-up capability, and explicit semantic label prediction in the 3D world. The parameter usage in semantic SLAM systems increases significantly with the growing complexity of the environment, making it particularly challenging and costly for scene understanding. To address this problem, we introduce a novel hierarchical representation that encodes semantic information in a compact form into 3D Gaussian Splatting, leveraging the capabilities of large language models (LLMs). We further introduce a novel semantic loss designed to optimize hierarchical semantic information through both inter-level and cross-level optimization. Furthermore, we enhance the whole SLAM system, resulting in improved tracking and mapping performance. Our Hier-SLAM outperforms existing dense SLAM methods in both mapping and tracking accuracy, while achieving a 2x operation speed-up. Additionally, it exhibits competitive performance in rendering semantic segmentation in small synthetic scenes, with significantly reduced storage and training time requirements. Rendering FPS impressively reaches 2,000 with semantic information and 3,000 without it. Most notably, it showcases the capability of handling the complex real-world scene with more than 500 semantic classes, highlighting its valuable scaling-up capability.
Related papers
- PanoSLAM: Panoptic 3D Scene Reconstruction via Gaussian SLAM [105.01907579424362]
PanoSLAM is the first SLAM system to integrate geometric reconstruction, 3D semantic segmentation, and 3D instance segmentation within a unified framework.
For the first time, it achieves panoptic 3D reconstruction of open-world environments directly from the RGB-D video.
arXiv Detail & Related papers (2024-12-31T08:58:10Z) - IG-SLAM: Instant Gaussian SLAM [6.228980850646457]
3D Gaussian Splatting has recently shown promising results as an alternative scene representation in SLAM systems.
We present IG-SLAM, a dense RGB-only SLAM system that employs robust Dense-SLAM methods for tracking and combines them with Gaussian Splatting.
We demonstrate competitive performance with state-of-the-art RGB-only SLAM systems while achieving faster operation speeds.
arXiv Detail & Related papers (2024-08-02T09:07:31Z) - NIS-SLAM: Neural Implicit Semantic RGB-D SLAM for 3D Consistent Scene Understanding [31.56016043635702]
We introduce NIS-SLAM, an efficient neural implicit semantic RGB-D SLAM system.
For high-fidelity surface reconstruction and spatial consistent scene understanding, we combine high-frequency multi-resolution tetrahedron-based features.
We also show that our approach can be used in augmented reality applications.
arXiv Detail & Related papers (2024-07-30T14:27:59Z) - CLIP-GS: CLIP-Informed Gaussian Splatting for Real-time and View-consistent 3D Semantic Understanding [32.76277160013881]
We present CLIP-GS, which integrates semantics from Contrastive Language-Image Pre-Training (CLIP) into Gaussian Splatting.
SAC exploits the inherent unified semantics within objects to learn compact yet effective semantic representations of 3D Gaussians.
We also introduce a 3D Coherent Self-training (3DCS) strategy, resorting to the multi-view consistency originated from the 3D model.
arXiv Detail & Related papers (2024-04-22T15:01:32Z) - NEDS-SLAM: A Neural Explicit Dense Semantic SLAM Framework using 3D Gaussian Splatting [5.655341825527482]
NEDS-SLAM is a dense semantic SLAM system based on 3D Gaussian representation.
We propose a Spatially Consistent Feature Fusion model to reduce the effect of erroneous estimates from pre-trained segmentation head.
We employ a lightweight encoder-decoder to compress the high-dimensional semantic features into a compact 3D Gaussian representation.
arXiv Detail & Related papers (2024-03-18T11:31:03Z) - SemGauss-SLAM: Dense Semantic Gaussian Splatting SLAM [14.126704753481972]
We propose SemGauss-SLAM, a dense semantic SLAM system that enables accurate 3D semantic mapping, robust camera tracking, and high-quality rendering simultaneously.
We incorporate semantic feature embedding into 3D Gaussian representation, which effectively encodes semantic information within the spatial layout of the environment.
To reduce cumulative drift in tracking and improve semantic reconstruction accuracy, we introduce semantic-informed bundle adjustment.
arXiv Detail & Related papers (2024-03-12T10:33:26Z) - DNS SLAM: Dense Neural Semantic-Informed SLAM [92.39687553022605]
DNS SLAM is a novel neural RGB-D semantic SLAM approach featuring a hybrid representation.
Our method integrates multi-view geometry constraints with image-based feature extraction to improve appearance details.
Our experimental results achieve state-of-the-art performance on both synthetic data and real-world data tracking.
arXiv Detail & Related papers (2023-11-30T21:34:44Z) - GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting [51.96353586773191]
We introduce textbfGS-SLAM that first utilizes 3D Gaussian representation in the Simultaneous Localization and Mapping system.
Our method utilizes a real-time differentiable splatting rendering pipeline that offers significant speedup to map optimization and RGB-D rendering.
Our method achieves competitive performance compared with existing state-of-the-art real-time methods on the Replica, TUM-RGBD datasets.
arXiv Detail & Related papers (2023-11-20T12:08:23Z) - Volumetric Semantically Consistent 3D Panoptic Mapping [77.13446499924977]
We introduce an online 2D-to-3D semantic instance mapping algorithm aimed at generating semantic 3D maps suitable for autonomous agents in unstructured environments.
It introduces novel ways of integrating semantic prediction confidence during mapping, producing semantic and instance-consistent 3D regions.
The proposed method achieves accuracy superior to the state of the art on public large-scale datasets, improving on a number of widely used metrics.
arXiv Detail & Related papers (2023-09-26T08:03:10Z) - NICER-SLAM: Neural Implicit Scene Encoding for RGB SLAM [111.83168930989503]
NICER-SLAM is a dense RGB SLAM system that simultaneously optimize for camera poses and a hierarchical neural implicit map representation.
We show strong performance in dense mapping, tracking, and novel view synthesis, even competitive with recent RGB-D SLAM systems.
arXiv Detail & Related papers (2023-02-07T17:06:34Z) - NICE-SLAM: Neural Implicit Scalable Encoding for SLAM [112.6093688226293]
NICE-SLAM is a dense SLAM system that incorporates multi-level local information by introducing a hierarchical scene representation.
Compared to recent neural implicit SLAM systems, our approach is more scalable, efficient, and robust.
arXiv Detail & Related papers (2021-12-22T18:45:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.