MOTSLAM: MOT-assisted monocular dynamic SLAM using single-view depth
estimation
- URL: http://arxiv.org/abs/2210.02038v1
- Date: Wed, 5 Oct 2022 06:07:10 GMT
- Title: MOTSLAM: MOT-assisted monocular dynamic SLAM using single-view depth
estimation
- Authors: Hanwei Zhang, Hideaki Uchiyama, Shintaro Ono and Hiroshi Kawasaki
- Abstract summary: MOTSLAM is a dynamic visual SLAM system with the monocular configuration that tracks both poses and bounding boxes of dynamic objects.
Our experiments on the KITTI dataset demonstrate that our system has reached best performance on both camera ego-motion and object tracking on monocular dynamic SLAM.
- Score: 5.33931801679129
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual SLAM systems targeting static scenes have been developed with
satisfactory accuracy and robustness. Dynamic 3D object tracking has then
become a significant capability in visual SLAM with the requirement of
understanding dynamic surroundings in various scenarios including autonomous
driving, augmented and virtual reality. However, performing dynamic SLAM solely
with monocular images remains a challenging problem due to the difficulty of
associating dynamic features and estimating their positions. In this paper, we
present MOTSLAM, a dynamic visual SLAM system with the monocular configuration
that tracks both poses and bounding boxes of dynamic objects. MOTSLAM first
performs multiple object tracking (MOT) with associated both 2D and 3D bounding
box detection to create initial 3D objects. Then, neural-network-based
monocular depth estimation is applied to fetch the depth of dynamic features.
Finally, camera poses, object poses, and both static, as well as dynamic map
points, are jointly optimized using a novel bundle adjustment. Our experiments
on the KITTI dataset demonstrate that our system has reached best performance
on both camera ego-motion and object tracking on monocular dynamic SLAM.
Related papers
- V3D-SLAM: Robust RGB-D SLAM in Dynamic Environments with 3D Semantic Geometry Voting [1.3493547928462395]
Simultaneous localization and mapping (SLAM) in highly dynamic environments is challenging due to the correlation between moving objects and the camera pose.
We propose a robust method, V3D-SLAM, to remove moving objects via two lightweight re-evaluation stages.
Our experiment on the TUM RGB-D benchmark on dynamic sequences with ground-truth camera trajectories showed that our methods outperform the most recent state-of-the-art SLAM methods.
arXiv Detail & Related papers (2024-10-15T21:08:08Z) - Shape of Motion: 4D Reconstruction from a Single Video [51.04575075620677]
We introduce a method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion.
We exploit the low-dimensional structure of 3D motion by representing scene motion with a compact set of SE3 motion bases.
Our method achieves state-of-the-art performance for both long-range 3D/2D motion estimation and novel view synthesis on dynamic scenes.
arXiv Detail & Related papers (2024-07-18T17:59:08Z) - EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting [95.44545809256473]
EgoGaussian is a method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone.
We show significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art.
arXiv Detail & Related papers (2024-06-28T10:39:36Z) - DO3D: Self-supervised Learning of Decomposed Object-aware 3D Motion and
Depth from Monocular Videos [76.01906393673897]
We propose a self-supervised method to jointly learn 3D motion and depth from monocular videos.
Our system contains a depth estimation module to predict depth, and a new decomposed object-wise 3D motion (DO3D) estimation module to predict ego-motion and 3D object motion.
Our model delivers superior performance in all evaluated settings.
arXiv Detail & Related papers (2024-03-09T12:22:46Z) - 3DS-SLAM: A 3D Object Detection based Semantic SLAM towards Dynamic
Indoor Environments [1.4901625182926226]
We introduce 3DS-SLAM, 3D Semantic SLAM, tailored for dynamic scenes with visual 3D object detection.
The 3DS-SLAM is a tightly-coupled algorithm resolving both semantic and geometric constraints sequentially.
It exhibits an average improvement of 98.01% across the dynamic sequences of the TUM RGB-D dataset.
arXiv Detail & Related papers (2023-10-10T07:48:40Z) - Decoupling Dynamic Monocular Videos for Dynamic View Synthesis [50.93409250217699]
We tackle the challenge of dynamic view synthesis from dynamic monocular videos in an unsupervised fashion.
Specifically, we decouple the motion of the dynamic objects into object motion and camera motion, respectively regularized by proposed unsupervised surface consistency and patch-based multi-view constraints.
arXiv Detail & Related papers (2023-04-04T11:25:44Z) - Attentive and Contrastive Learning for Joint Depth and Motion Field
Estimation [76.58256020932312]
Estimating the motion of the camera together with the 3D structure of the scene from a monocular vision system is a complex task.
We present a self-supervised learning framework for 3D object motion field estimation from monocular videos.
arXiv Detail & Related papers (2021-10-13T16:45:01Z) - AirDOS: Dynamic SLAM benefits from Articulated Objects [9.045690662672659]
Object-aware SLAM (DOS) exploits object-level information to enable robust motion estimation in dynamic environments.
AirDOS is the first dynamic object-aware SLAM system demonstrating that camera pose estimation can be improved by incorporating dynamic articulated objects.
arXiv Detail & Related papers (2021-09-21T01:23:48Z) - DynaSLAM II: Tightly-Coupled Multi-Object Tracking and SLAM [2.9822184411723645]
DynaSLAM II is a visual SLAM system for stereo and RGB-D configurations that tightly integrates the multi-object tracking capability.
We demonstrate that tracking dynamic objects does not only provide rich clues for scene understanding but is also beneficial for camera tracking.
arXiv Detail & Related papers (2020-10-15T15:25:30Z) - DOT: Dynamic Object Tracking for Visual SLAM [83.69544718120167]
DOT combines instance segmentation and multi-view geometry to generate masks for dynamic objects.
To determine which objects are actually moving, DOT segments first instances of potentially dynamic objects and then, with the estimated camera motion, tracks such objects by minimizing the photometric reprojection error.
Our results show that our approach improves significantly the accuracy and robustness of ORB-SLAM 2, especially in highly dynamic scenes.
arXiv Detail & Related papers (2020-09-30T18:36:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.