GigaSLAM: Large-Scale Monocular SLAM with Hierarchical Gaussian Splats
- URL: http://arxiv.org/abs/2503.08071v2
- Date: Tue, 10 Jun 2025 15:30:19 GMT
- Title: GigaSLAM: Large-Scale Monocular SLAM with Hierarchical Gaussian Splats
- Authors: Kai Deng, Yigong Zhang, Jian Yang, Jin Xie,
- Abstract summary: We introduce GigaSLAM, the first RGB NeRF / 3DGS-based SLAM framework for large-scale, unbounded outdoor environments.<n>Our approach employs a hierarchical sparse voxel map representation, where Gaussians are decoded by neural networks at multiple levels of detail.<n>GigaSLAM delivers high-precision tracking and visually faithful rendering on urban outdoor benchmarks.
- Score: 20.98774763235796
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tracking and mapping in large-scale, unbounded outdoor environments using only monocular RGB input presents substantial challenges for existing SLAM systems. Traditional Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) SLAM methods are typically limited to small, bounded indoor settings. To overcome these challenges, we introduce GigaSLAM, the first RGB NeRF / 3DGS-based SLAM framework for kilometer-scale outdoor environments, as demonstrated on the KITTI, KITTI 360, 4 Seasons and A2D2 datasets. Our approach employs a hierarchical sparse voxel map representation, where Gaussians are decoded by neural networks at multiple levels of detail. This design enables efficient, scalable mapping and high-fidelity viewpoint rendering across expansive, unbounded scenes. For front-end tracking, GigaSLAM utilizes a metric depth model combined with epipolar geometry and PnP algorithms to accurately estimate poses, while incorporating a Bag-of-Words-based loop closure mechanism to maintain robust alignment over long trajectories. Consequently, GigaSLAM delivers high-precision tracking and visually faithful rendering on urban outdoor benchmarks, establishing a robust SLAM solution for large-scale, long-term scenarios, and significantly extending the applicability of Gaussian Splatting SLAM systems to unbounded outdoor environments. GitHub: https://github.com/DengKaiCQ/GigaSLAM.
Related papers
- MCN-SLAM: Multi-Agent Collaborative Neural SLAM with Hybrid Implicit Neural Scene Representation [51.07118703442774]
Existing NeRF-based multi-agent SLAM frameworks cannot meet the constraints of communication bandwidth.<n>We propose the first distributed multi-agent collaborative neural SLAM framework with hybrid scene representation.<n>A novel triplane-grid joint scene representation method is proposed to improve scene reconstruction.<n>A novel intra-to-inter loop closure method is designed to achieve local (single-agent) and global (multi-agent) consistency.
arXiv Detail & Related papers (2025-06-23T14:22:29Z) - VPGS-SLAM: Voxel-based Progressive 3D Gaussian SLAM in Large-Scale Scenes [26.06908154350295]
VPGS-SLAM is the first 3DGS-based large-scale RGBD SLAM framework for both indoor and outdoor scenarios.<n>We design a novel voxel-based progressive 3D Gaussian mapping method with multiple submaps for compact and accurate scene representation.<n>In addition, we propose a 2D-3D fusion camera tracking method to achieve robust and accurate camera tracking in both indoor and outdoor large-scale scenes.
arXiv Detail & Related papers (2025-05-25T06:27:29Z) - Large-Scale Gaussian Splatting SLAM [21.253966057320383]
This paper introduces a large-scale 3DGS-based visual SLAM with stereo cameras, termed LSG-SLAM.<n>With extensive evaluations on the EuRoc and KITTI datasets, LSG-SLAM achieves superior performance over existing Neural, 3DGS-based, and even traditional approaches.
arXiv Detail & Related papers (2025-05-15T03:00:32Z) - EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis [61.1662426227688]
Existing NeRF and 3DGS-based methods show promising results in achieving photorealistic renderings but require slow, per-scene optimization.
We introduce EVolSplat, an efficient 3D Gaussian Splatting model for urban scenes that works in a feed-forward manner.
arXiv Detail & Related papers (2025-03-26T02:47:27Z) - VIGS SLAM: IMU-based Large-Scale 3D Gaussian Splatting SLAM [15.841609263723576]
We propose a novel 3D Gaussian Splatting SLAM method, VIGS SLAM, for large-scale indoor environments.<n>Our proposed method is the first to propose that Gaussian Splatting-based SLAM can be effectively performed in large-scale environments by integrating IMU sensor measurements.<n>This proposal not only enhances the performance of Gaussian Splatting SLAM beyond room-scale scenarios but also achieves SLAM performance comparable to state-of-the-art methods in large-scale indoor environments.
arXiv Detail & Related papers (2025-01-23T06:01:03Z) - VINGS-Mono: Visual-Inertial Gaussian Splatting Monocular SLAM in Large Scenes [10.287279799581544]
VINGS-Mono is a monocular (inertial) Gaussian Splatting (GS) SLAM framework designed for large scenes.<n>The framework comprises four main components: VIO Front End, 2D Gaussian Map, NVS Loop Closure, and Dynamic Eraser.
arXiv Detail & Related papers (2025-01-14T18:01:15Z) - HI-SLAM2: Geometry-Aware Gaussian SLAM for Fast Monocular Scene Reconstruction [38.47566815670662]
HI-SLAM2 is a geometry-aware Gaussian SLAM system that achieves fast and accurate monocular scene reconstruction using only RGB input.<n>We demonstrate significant improvements over existing Neural SLAM methods and even surpass RGB-D-based methods in both reconstruction and rendering quality.
arXiv Detail & Related papers (2024-11-27T01:39:21Z) - Splat-SLAM: Globally Optimized RGB-only SLAM with 3D Gaussians [87.48403838439391]
3D Splatting has emerged as a powerful representation of geometry and appearance for RGB-only dense Simultaneous SLAM.
We propose the first RGB-only SLAM system with a dense 3D Gaussian map representation.
Our experiments on the Replica, TUM-RGBD, and ScanNet datasets indicate the effectiveness of globally optimized 3D Gaussians.
arXiv Detail & Related papers (2024-05-26T12:26:54Z) - MotionGS : Compact Gaussian Splatting SLAM by Motion Filter [10.979138131565238]
There has been a surge in NeRF-based SLAM, while 3DGS-based SLAM is sparse.
A novel 3DGS-based SLAM approach with a fusion of deep visual feature, dual selection and 3DGS is presented in this paper.
arXiv Detail & Related papers (2024-05-18T00:47:29Z) - MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements [59.70107451308687]
We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM.
Our method, MM3DGS, addresses the limitations of prior rendering by enabling faster scale awareness, and improved trajectory tracking.
We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit.
arXiv Detail & Related papers (2024-04-01T04:57:41Z) - GlORIE-SLAM: Globally Optimized RGB-only Implicit Encoding Point Cloud SLAM [53.6402869027093]
We propose an efficient RGB-only dense SLAM system using a flexible neural point cloud representation scene.
We also introduce a novel DSPO layer for bundle adjustment which optimize the pose and depth of implicits along with the scale of the monocular depth.
arXiv Detail & Related papers (2024-03-28T16:32:06Z) - CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field [46.8198987091734]
This paper presents an efficient dense RGB-D SLAM system, i.e., CG-SLAM, based on a novel uncertainty-aware 3D Gaussian field.
Experiments on various datasets demonstrate that CG-SLAM achieves superior tracking and mapping performance with a notable tracking speed of up to 15 Hz.
arXiv Detail & Related papers (2024-03-24T11:19:59Z) - MoD-SLAM: Monocular Dense Mapping for Unbounded 3D Scene Reconstruction [2.3630527334737104]
MoD-SLAM is the first monocular NeRF-based dense mapping method that allows 3D reconstruction in real-time in unbounded scenes.
By introducing a robust depth loss term into the tracking process, our SLAM system achieves more precise pose estimation in large-scale scenes.
Our experiments on two standard datasets show that MoD-SLAM achieves competitive performance, improving the accuracy of the 3D reconstruction and localization by up to 30% and 15% respectively.
arXiv Detail & Related papers (2024-02-06T07:07:33Z) - SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM [48.190398577764284]
SplaTAM is an approach to enable high-fidelity reconstruction from a single unposed RGB-D camera.
It employs a simple online tracking and mapping system tailored to the underlying Gaussian representation.
Experiments show that SplaTAM achieves up to 2x superior performance in camera pose estimation, map construction, and novel-view synthesis over existing methods.
arXiv Detail & Related papers (2023-12-04T18:53:24Z) - NICER-SLAM: Neural Implicit Scene Encoding for RGB SLAM [111.83168930989503]
NICER-SLAM is a dense RGB SLAM system that simultaneously optimize for camera poses and a hierarchical neural implicit map representation.
We show strong performance in dense mapping, tracking, and novel view synthesis, even competitive with recent RGB-D SLAM systems.
arXiv Detail & Related papers (2023-02-07T17:06:34Z) - NICE-SLAM: Neural Implicit Scalable Encoding for SLAM [112.6093688226293]
NICE-SLAM is a dense SLAM system that incorporates multi-level local information by introducing a hierarchical scene representation.
Compared to recent neural implicit SLAM systems, our approach is more scalable, efficient, and robust.
arXiv Detail & Related papers (2021-12-22T18:45:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.