Related papers: Dropping the D: RGB-D SLAM Without the Depth Sensor

Dropping the D: RGB-D SLAM Without the Depth Sensor

URL: http://arxiv.org/abs/2510.06216v2
Date: Sun, 02 Nov 2025 21:12:24 GMT
Title: Dropping the D: RGB-D SLAM Without the Depth Sensor
Authors: Mert Kiray, Alican Karaomer, Benjamin Busam,
Abstract summary: We present DropD-SLAM, a real-time monocular SLAM system that achieves RGB-D-level accuracy without relying on depth sensors.<n>The system replaces active depth input with three pretrained vision modules.<n>On the TUM RGB-D benchmark, DropD-SLAM attains 7.4 cm mean ATE on static sequences and 1.8 cm on dynamic sequences.
Score: 16.83416267639945
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We present DropD-SLAM, a real-time monocular SLAM system that achieves RGB-D-level accuracy without relying on depth sensors. The system replaces active depth input with three pretrained vision modules: a monocular metric depth estimator, a learned keypoint detector, and an instance segmentation network. Dynamic objects are suppressed using dilated instance masks, while static keypoints are assigned predicted depth values and backprojected into 3D to form metrically scaled features. These are processed by an unmodified RGB-D SLAM back end for tracking and mapping. On the TUM RGB-D benchmark, DropD-SLAM attains 7.4 cm mean ATE on static sequences and 1.8 cm on dynamic sequences, matching or surpassing state-of-the-art RGB-D methods while operating at 22 FPS on a single GPU. These results suggest that modern pretrained vision models can replace active depth sensors as reliable, real-time sources of metric scale, marking a step toward simpler and more cost-effective SLAM systems.

Related papers

Masked Depth Modeling for Spatial Perception [44.0326843862591]
LingBot-Depth is a depth completion model that refines depth maps through masked depth modeling.<n>It outperforms top-tier RGB-D cameras in terms of both depth precision and pixel coverage.<n>We release the code, checkpoint, and 3M RGB-depth pairs to the community of spatial perception.
arXiv Detail & Related papers (2026-01-25T16:13:49Z)
MCGS-SLAM: A Multi-Camera SLAM Framework Using Gaussian Splatting for High-Fidelity Mapping [52.99503784067417]
We present MCGS-SLAM, the first purely RGB-based multi-camera SLAM system built on 3D Gaussian Splatting (3DGS)<n>A multi-camera bundle adjustment (MCBA) jointly refines poses and depths via dense photometric and geometric residuals, while a scale consistency module enforces metric alignment across views.<n>Experiments on synthetic and real-world datasets show that MCGS-SLAM consistently yields accurate trajectories and photorealistic reconstructions.
arXiv Detail & Related papers (2025-09-17T17:27:53Z)
Pseudo Depth Meets Gaussian: A Feed-forward RGB SLAM Baseline [64.42938561167402]
We propose an online 3D reconstruction method using 3D Gaussian-based SLAM, combined with a feed-forward recurrent prediction module.<n>This approach replaces slow test-time optimization with fast network inference, significantly improving tracking speed.<n>Our method achieves performance on par with the state-of-the-art SplaTAM, while reducing tracking time by more than 90%.
arXiv Detail & Related papers (2025-08-06T16:16:58Z)
Dy3DGS-SLAM: Monocular 3D Gaussian Splatting SLAM for Dynamic Environments [5.050525952210101]
We propose Dy3DGS-SLAM, the first 3D Gaussian Splatting (3DGS) SLAM method for dynamic scenes using monocular RGB input.<n>Results demonstrate that Dy3DGS-SLAM achieves state-of-the-art tracking and rendering in dynamic environments.
arXiv Detail & Related papers (2025-06-06T10:43:41Z)
HI-SLAM2: Geometry-Aware Gaussian SLAM for Fast Monocular Scene Reconstruction [38.47566815670662]
HI-SLAM2 is a geometry-aware Gaussian SLAM system that achieves fast and accurate monocular scene reconstruction using only RGB input.<n>We demonstrate significant improvements over existing Neural SLAM methods and even surpass RGB-D-based methods in both reconstruction and rendering quality.
arXiv Detail & Related papers (2024-11-27T01:39:21Z)
MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements [59.70107451308687]
We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM. Our method, MM3DGS, addresses the limitations of prior rendering by enabling faster scale awareness, and improved trajectory tracking. We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit.
arXiv Detail & Related papers (2024-04-01T04:57:41Z)
UncLe-SLAM: Uncertainty Learning for Dense Neural SLAM [60.575435353047304]
We present an uncertainty learning framework for dense neural simultaneous localization and mapping (SLAM) We propose an online framework for sensor uncertainty estimation that can be trained in a self-supervised manner from only 2D input data.
arXiv Detail & Related papers (2023-06-19T16:26:25Z)
Using Detection, Tracking and Prediction in Visual SLAM to Achieve Real-time Semantic Mapping of Dynamic Scenarios [70.70421502784598]
RDS-SLAM can build semantic maps at object level for dynamic scenarios in real time using only one commonly used Intel Core i7 CPU. We evaluate RDS-SLAM in TUM RGB-D dataset, and experimental results show that RDS-SLAM can run with 30.3 ms per frame in dynamic scenarios.
arXiv Detail & Related papers (2022-10-10T11:03:32Z)
DVIO: Depth aided visual inertial odometry for RGBD sensors [7.745106319694523]
This paper presents a new visual inertial odometry (VIO) system, which uses measurements from a RGBD sensor and an inertial measurement unit (IMU) sensor for estimating the motion state of the mobile device. The resulting system is called the depth-aided VIO (DVIO) system.
arXiv Detail & Related papers (2021-10-20T22:12:01Z)
RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB Video [76.86512780916827]
We present the first real-time method for motion capture of skeletal pose and 3D surface geometry of hands from a single RGB camera. In order to address the inherent depth ambiguities in RGB data, we propose a novel multi-task CNN. We experimentally verify the individual components of our RGB two-hand tracking and 3D reconstruction pipeline.
arXiv Detail & Related papers (2021-06-22T12:53:56Z)
RGB-D Odometry and SLAM [20.02647320786556]
RGB-D sensors are low-cost, low-power and low-size alternatives to traditional range sensors such as LiDAR. Unlike RGB cameras, RGB-D sensors provide the additional depth information that removes the need of frame-by-frame triangulation for 3D scene reconstruction. This chapter consists of three main parts: In the first part, we introduce the basic concept of odometry and SLAM and motivate the use of RGB-D sensors. In the second part, we detail the three main components of SLAM systems: camera pose tracking, scene mapping and loop closing.
arXiv Detail & Related papers (2020-01-19T17:56:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.