MOTSLAM: MOT-assisted monocular dynamic SLAM using single-view depth
estimation
- URL: http://arxiv.org/abs/2210.02038v1
- Date: Wed, 5 Oct 2022 06:07:10 GMT
- Title: MOTSLAM: MOT-assisted monocular dynamic SLAM using single-view depth
estimation
- Authors: Hanwei Zhang, Hideaki Uchiyama, Shintaro Ono and Hiroshi Kawasaki
- Abstract summary: MOTSLAM is a dynamic visual SLAM system with the monocular configuration that tracks both poses and bounding boxes of dynamic objects.
Our experiments on the KITTI dataset demonstrate that our system has reached best performance on both camera ego-motion and object tracking on monocular dynamic SLAM.
- Score: 5.33931801679129
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual SLAM systems targeting static scenes have been developed with
satisfactory accuracy and robustness. Dynamic 3D object tracking has then
become a significant capability in visual SLAM with the requirement of
understanding dynamic surroundings in various scenarios including autonomous
driving, augmented and virtual reality. However, performing dynamic SLAM solely
with monocular images remains a challenging problem due to the difficulty of
associating dynamic features and estimating their positions. In this paper, we
present MOTSLAM, a dynamic visual SLAM system with the monocular configuration
that tracks both poses and bounding boxes of dynamic objects. MOTSLAM first
performs multiple object tracking (MOT) with associated both 2D and 3D bounding
box detection to create initial 3D objects. Then, neural-network-based
monocular depth estimation is applied to fetch the depth of dynamic features.
Finally, camera poses, object poses, and both static, as well as dynamic map
points, are jointly optimized using a novel bundle adjustment. Our experiments
on the KITTI dataset demonstrate that our system has reached best performance
on both camera ego-motion and object tracking on monocular dynamic SLAM.
Related papers
- Shape of Motion: 4D Reconstruction from a Single Video [51.04575075620677]
We introduce a method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion.
We exploit the low-dimensional structure of 3D motion by representing scene motion with a compact set of SE3 motion bases.
Our method achieves state-of-the-art performance for both long-range 3D/2D motion estimation and novel view synthesis on dynamic scenes.
arXiv Detail & Related papers (2024-07-18T17:59:08Z) - DO3D: Self-supervised Learning of Decomposed Object-aware 3D Motion and
Depth from Monocular Videos [76.01906393673897]
We propose a self-supervised method to jointly learn 3D motion and depth from monocular videos.
Our system contains a depth estimation module to predict depth, and a new decomposed object-wise 3D motion (DO3D) estimation module to predict ego-motion and 3D object motion.
Our model delivers superior performance in all evaluated settings.
arXiv Detail & Related papers (2024-03-09T12:22:46Z) - 3DS-SLAM: A 3D Object Detection based Semantic SLAM towards Dynamic
Indoor Environments [1.4901625182926226]
We introduce 3DS-SLAM, 3D Semantic SLAM, tailored for dynamic scenes with visual 3D object detection.
The 3DS-SLAM is a tightly-coupled algorithm resolving both semantic and geometric constraints sequentially.
It exhibits an average improvement of 98.01% across the dynamic sequences of the TUM RGB-D dataset.
arXiv Detail & Related papers (2023-10-10T07:48:40Z) - Delving into Motion-Aware Matching for Monocular 3D Object Tracking [81.68608983602581]
We find that the motion cue of objects along different time frames is critical in 3D multi-object tracking.
We propose MoMA-M3T, a framework that mainly consists of three motion-aware components.
We conduct extensive experiments on the nuScenes and KITTI datasets to demonstrate our MoMA-M3T achieves competitive performance against state-of-the-art methods.
arXiv Detail & Related papers (2023-08-22T17:53:58Z) - Decoupling Dynamic Monocular Videos for Dynamic View Synthesis [50.93409250217699]
We tackle the challenge of dynamic view synthesis from dynamic monocular videos in an unsupervised fashion.
Specifically, we decouple the motion of the dynamic objects into object motion and camera motion, respectively regularized by proposed unsupervised surface consistency and patch-based multi-view constraints.
arXiv Detail & Related papers (2023-04-04T11:25:44Z) - Visual-Inertial Multi-Instance Dynamic SLAM with Object-level
Relocalisation [14.302118093865849]
We present a tightly-coupled visual-inertial object-level multi-instance dynamic SLAM system.
It can robustly optimise for the camera pose, velocity, IMU biases and build a dense 3D reconstruction object-level map of the environment.
arXiv Detail & Related papers (2022-08-08T17:13:24Z) - Attentive and Contrastive Learning for Joint Depth and Motion Field
Estimation [76.58256020932312]
Estimating the motion of the camera together with the 3D structure of the scene from a monocular vision system is a complex task.
We present a self-supervised learning framework for 3D object motion field estimation from monocular videos.
arXiv Detail & Related papers (2021-10-13T16:45:01Z) - AirDOS: Dynamic SLAM benefits from Articulated Objects [9.045690662672659]
Object-aware SLAM (DOS) exploits object-level information to enable robust motion estimation in dynamic environments.
AirDOS is the first dynamic object-aware SLAM system demonstrating that camera pose estimation can be improved by incorporating dynamic articulated objects.
arXiv Detail & Related papers (2021-09-21T01:23:48Z) - DynaSLAM II: Tightly-Coupled Multi-Object Tracking and SLAM [2.9822184411723645]
DynaSLAM II is a visual SLAM system for stereo and RGB-D configurations that tightly integrates the multi-object tracking capability.
We demonstrate that tracking dynamic objects does not only provide rich clues for scene understanding but is also beneficial for camera tracking.
arXiv Detail & Related papers (2020-10-15T15:25:30Z) - DOT: Dynamic Object Tracking for Visual SLAM [83.69544718120167]
DOT combines instance segmentation and multi-view geometry to generate masks for dynamic objects.
To determine which objects are actually moving, DOT segments first instances of potentially dynamic objects and then, with the estimated camera motion, tracks such objects by minimizing the photometric reprojection error.
Our results show that our approach improves significantly the accuracy and robustness of ORB-SLAM 2, especially in highly dynamic scenes.
arXiv Detail & Related papers (2020-09-30T18:36:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.