Related papers: Improved Real-Time Monocular SLAM Using Semantic Segmentation on Selective Frames

Improved Real-Time Monocular SLAM Using Semantic Segmentation on Selective Frames

URL: http://arxiv.org/abs/2105.00114v1
Date: Fri, 30 Apr 2021 22:34:45 GMT
Title: Improved Real-Time Monocular SLAM Using Semantic Segmentation on Selective Frames
Authors: Jinkyu Lee, Muhyun Back, Sung Soo Hwang and Il Yong Chun
Abstract summary: monocular simultaneous localization and mapping (SLAM) is emerging in advanced driver assistance systems and autonomous driving. This paper proposes an improved real-time monocular SLAM using deep learning-based semantic segmentation. Experiments with six video sequences demonstrate that the proposed monocular SLAM system achieves significantly more accurate trajectory tracking accuracy.
Score: 15.455647477995312
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Monocular simultaneous localization and mapping (SLAM) is emerging in advanced driver assistance systems and autonomous driving, because a single camera is cheap and easy to install. Conventional monocular SLAM has two major challenges leading inaccurate localization and mapping. First, it is challenging to estimate scales in localization and mapping. Second, conventional monocular SLAM uses inappropriate mapping factors such as dynamic objects and low-parallax ares in mapping. This paper proposes an improved real-time monocular SLAM that resolves the aforementioned challenges by efficiently using deep learning-based semantic segmentation. To achieve the real-time execution of the proposed method, we apply semantic segmentation only to downsampled keyframes in parallel with mapping processes. In addition, the proposed method corrects scales of camera poses and three-dimensional (3D) points, using estimated ground plane from road-labeled 3D points and the real camera height. The proposed method also removes inappropriate corner features labeled as moving objects and low parallax areas. Experiments with six video sequences demonstrate that the proposed monocular SLAM system achieves significantly more accurate trajectory tracking accuracy compared to state-of-the-art monocular SLAM and comparable trajectory tracking accuracy compared to state-of-the-art stereo SLAM.

Related papers

S3MOT: Monocular 3D Object Tracking with Selective State Space Model [3.5047603107971397]
Multi-object tracking in 3D space is essential for advancing robotics and computer applications. It remains a significant challenge in monocular setups due to the difficulty of mining 3D associations from 2D video streams. We present three innovative techniques to enhance the fusion of heterogeneous cues for monocular 3D MOT.
arXiv Detail & Related papers (2025-04-25T04:45:35Z)
POMATO: Marrying Pointmap Matching with Temporal Motion for Dynamic 3D Reconstruction [53.19968902152528]
We present POMATO, a unified framework for dynamic 3D reconstruction by marrying pointmap matching with temporal motion. Specifically, our method learns an explicit matching relationship by mapping RGB pixels from both dynamic and static regions across different views to 3D pointmaps. We show the effectiveness of the proposed pointmap matching and temporal fusion paradigm by demonstrating the remarkable performance across multiple downstream tasks.
arXiv Detail & Related papers (2025-04-08T05:33:13Z)
MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors [15.764342766592808]
We present a real-time monocular dense SLAM system designed bottom-up from MASt3R. Our system is robust on in-the-wild video sequences despite making no assumption on a fixed or parametric camera model.
arXiv Detail & Related papers (2024-12-16T23:00:05Z)
MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements [59.70107451308687]
We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM. Our method, MM3DGS, addresses the limitations of prior rendering by enabling faster scale awareness, and improved trajectory tracking. We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit.
arXiv Detail & Related papers (2024-04-01T04:57:41Z)
MoD-SLAM: Monocular Dense Mapping for Unbounded 3D Scene Reconstruction [2.3630527334737104]
MoD-SLAM is the first monocular NeRF-based dense mapping method that allows 3D reconstruction in real-time in unbounded scenes. By introducing a robust depth loss term into the tracking process, our SLAM system achieves more precise pose estimation in large-scale scenes. Our experiments on two standard datasets show that MoD-SLAM achieves competitive performance, improving the accuracy of the 3D reconstruction and localization by up to 30% and 15% respectively.
arXiv Detail & Related papers (2024-02-06T07:07:33Z)
Cross-Modal Semi-Dense 6-DoF Tracking of an Event Camera in Challenging Conditions [29.608665442108727]
Event-based cameras are bio-inspired visual sensors that perform well in HDR conditions and have high temporal resolution. The present work demonstrates the feasibility of purely event-based tracking if an alternative sensor is permitted for mapping. The method relies on geometric 3D-2D registration of semi-dense maps and events, and achieves highly reliable and accurate cross-modal tracking results.
arXiv Detail & Related papers (2024-01-16T01:48:45Z)
Gaussian Splatting SLAM [16.3858380078553]
We present the first application of 3D Gaussian Splatting in monocular SLAM. Our method runs live at 3fps, unifying the required representation for accurate tracking, mapping, and high-quality rendering. Several innovations are required to continuously reconstruct 3D scenes with high fidelity from a live camera.
arXiv Detail & Related papers (2023-12-11T18:19:04Z)
Comparative Study of Visual SLAM-Based Mobile Robot Localization Using Fiducial Markers [4.918853205874711]
This paper presents a comparative study of three modes for mobile robot localization based on visual SLAM using fiducial markers. The reason for comparing the SLAM-based approaches is because previous work has shown their superior performance over feature-only methods. Hardware experiments show consistent trajectory error levels across the three modes, with the localization mode exhibiting the shortest runtime among them.
arXiv Detail & Related papers (2023-09-08T17:05:24Z)
View Consistent Purification for Accurate Cross-View Localization [59.48131378244399]
This paper proposes a fine-grained self-localization method for outdoor robotics. The proposed method addresses limitations in existing cross-view localization methods. It is the first sparse visual-only method that enhances perception in dynamic environments.
arXiv Detail & Related papers (2023-08-16T02:51:52Z)
ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving Cameras in the Wild [57.37891682117178]
We present a robust dense indirect structure-from-motion method for videos that is based on dense correspondence from pairwise optical flow. A novel neural network architecture is proposed for processing irregular point trajectory data. Experiments on MPI Sintel dataset show that our system produces significantly more accurate camera trajectories.
arXiv Detail & Related papers (2022-07-19T09:19:45Z)
TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo [55.30992853477754]
We present TANDEM, a real-time monocular tracking and dense framework. For pose estimation, TANDEM performs photometric bundle adjustment based on a sliding window of alignments. TANDEM shows state-of-the-art real-time 3D reconstruction performance.
arXiv Detail & Related papers (2021-11-14T19:01:02Z)
Self-Supervised Multi-Frame Monocular Scene Flow [61.588808225321735]
We introduce a multi-frame monocular scene flow network based on self-supervised learning. We observe state-of-the-art accuracy among monocular scene flow methods based on self-supervised learning.
arXiv Detail & Related papers (2021-05-05T17:49:55Z)
Robust On-Manifold Optimization for Uncooperative Space Relative Navigation with a Single Camera [4.129225533930966]
An innovative model-based approach is demonstrated to estimate the six-dimensional pose of a target object relative to the chaser spacecraft using solely a monocular setup. It is validated on realistic synthetic and laboratory datasets of a rendezvous trajectory with the complex spacecraft Envisat.
arXiv Detail & Related papers (2020-05-14T16:23:04Z)
Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras. We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points. Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.