MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements
- URL: http://arxiv.org/abs/2404.00923v1
- Date: Mon, 1 Apr 2024 04:57:41 GMT
- Title: MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements
- Authors: Lisong C. Sun, Neel P. Bhatt, Jonathan C. Liu, Zhiwen Fan, Zhangyang Wang, Todd E. Humphreys, Ufuk Topcu,
- Abstract summary: We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM.
Our method, MM3DGS, addresses the limitations of prior rendering by enabling faster scale awareness, and improved trajectory tracking.
We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit.
- Score: 59.70107451308687
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Simultaneous localization and mapping is essential for position tracking and scene understanding. 3D Gaussian-based map representations enable photorealistic reconstruction and real-time rendering of scenes using multiple posed cameras. We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM. Our method, MM3DGS, addresses the limitations of prior neural radiance field-based representations by enabling faster rendering, scale awareness, and improved trajectory tracking. Our framework enables keyframe-based mapping and tracking utilizing loss functions that incorporate relative pose transformations from pre-integrated inertial measurements, depth estimates, and measures of photometric rendering quality. We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit. Experimental evaluation on several scenes from the dataset shows that MM3DGS achieves 3x improvement in tracking and 5% improvement in photometric rendering quality compared to the current 3DGS SLAM state-of-the-art, while allowing real-time rendering of a high-resolution dense 3D map. Project Webpage: https://vita-group.github.io/MM3DGS-SLAM
Related papers
- EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis [61.1662426227688]
Existing NeRF and 3DGS-based methods show promising results in achieving photorealistic renderings but require slow, per-scene optimization.
We introduce EVolSplat, an efficient 3D Gaussian Splatting model for urban scenes that works in a feed-forward manner.
arXiv Detail & Related papers (2025-03-26T02:47:27Z) - MVS-GS: High-Quality 3D Gaussian Splatting Mapping via Online Multi-View Stereo [9.740087094317735]
We propose a novel framework for high-quality 3DGS modeling using an online multi-view stereo approach.
Our method estimates MVS depth using sequential frames from a local time window and applies comprehensive depth refinement techniques.
Experimental results demonstrate that our method outperforms state-of-the-art dense SLAM methods.
arXiv Detail & Related papers (2024-12-26T09:20:04Z) - PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting [54.7468067660037]
PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices.
Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS.
arXiv Detail & Related papers (2024-10-29T15:28:15Z) - GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization [1.4466437171584356]
We propose a two-stage procedure that integrates dense and robust keypoint descriptors from the lightweight XFeat feature extractor into 3DGS.
In the second stage, the initial pose estimate is refined by minimizing the rendering-based photometric warp loss.
Benchmarking on widely used indoor and outdoor datasets demonstrates improvements over recent neural rendering-based localization methods.
arXiv Detail & Related papers (2024-09-24T23:18:32Z) - LLMI3D: Empowering LLM with 3D Perception from a Single 2D Image [72.14973729674995]
Current 3D perception methods, particularly small models, struggle with processing logical reasoning, question-answering, and handling open scenario categories.
We propose solutions: Spatial-Enhanced Local Feature Mining for better spatial feature extraction, 3D Query Token-Derived Info Decoding for precise geometric regression, and Geometry Projection-Based 3D Reasoning for handling camera focal length variations.
arXiv Detail & Related papers (2024-08-14T10:00:16Z) - Visual SLAM with 3D Gaussian Primitives and Depth Priors Enabling Novel View Synthesis [11.236094544193605]
Conventional geometry-based SLAM systems lack dense 3D reconstruction capabilities.
We propose a real-time RGB-D SLAM system that incorporates a novel view synthesis technique, 3D Gaussian Splatting.
arXiv Detail & Related papers (2024-08-10T21:23:08Z) - PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting [59.277480452459315]
We propose a principled spatial sensitivity pruning score that outperforms current approaches.
We also propose a multi-round prune-refine pipeline that can be applied to any pretrained 3D-GS model.
Our pipeline increases the average rendering speed of 3D-GS by 2.65$times$ while retaining more salient foreground information.
arXiv Detail & Related papers (2024-06-14T17:53:55Z) - NEDS-SLAM: A Neural Explicit Dense Semantic SLAM Framework using 3D Gaussian Splatting [5.655341825527482]
NEDS-SLAM is a dense semantic SLAM system based on 3D Gaussian representation.
We propose a Spatially Consistent Feature Fusion model to reduce the effect of erroneous estimates from pre-trained segmentation head.
We employ a lightweight encoder-decoder to compress the high-dimensional semantic features into a compact 3D Gaussian representation.
arXiv Detail & Related papers (2024-03-18T11:31:03Z) - SemGauss-SLAM: Dense Semantic Gaussian Splatting SLAM [14.126704753481972]
We propose SemGauss-SLAM, a dense semantic SLAM system that enables accurate 3D semantic mapping, robust camera tracking, and high-quality rendering simultaneously.
We incorporate semantic feature embedding into 3D Gaussian representation, which effectively encodes semantic information within the spatial layout of the environment.
To reduce cumulative drift in tracking and improve semantic reconstruction accuracy, we introduce semantic-informed bundle adjustment.
arXiv Detail & Related papers (2024-03-12T10:33:26Z) - Gaussian Splatting SLAM [16.3858380078553]
We present the first application of 3D Gaussian Splatting in monocular SLAM.
Our method runs live at 3fps, unifying the required representation for accurate tracking, mapping, and high-quality rendering.
Several innovations are required to continuously reconstruct 3D scenes with high fidelity from a live camera.
arXiv Detail & Related papers (2023-12-11T18:19:04Z) - Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering [71.44349029439944]
Recent 3D Gaussian Splatting method has achieved the state-of-the-art rendering quality and speed.
We introduce Scaffold-GS, which uses anchor points to distribute local 3D Gaussians.
We show that our method effectively reduces redundant Gaussians while delivering high-quality rendering.
arXiv Detail & Related papers (2023-11-30T17:58:57Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.