DF-VO: What Should Be Learnt for Visual Odometry?
- URL: http://arxiv.org/abs/2103.00933v1
- Date: Mon, 1 Mar 2021 11:50:39 GMT
- Title: DF-VO: What Should Be Learnt for Visual Odometry?
- Authors: Huangying Zhan, Chamara Saroj Weerasekera, Jia-Wang Bian, Ravi Garg,
Ian Reid
- Abstract summary: We design a simple yet robust Visual Odometry system by integrating multi-view geometry and deep learning on Depth and optical Flow.
Comprehensive ablation studies show the effectiveness of the proposed method, and extensive evaluation results show the state-of-the-art performance of our system.
- Score: 33.379888882093965
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Multi-view geometry-based methods dominate the last few decades in monocular
Visual Odometry for their superior performance, while they have been vulnerable
to dynamic and low-texture scenes. More importantly, monocular methods suffer
from scale-drift issue, i.e., errors accumulate over time. Recent studies show
that deep neural networks can learn scene depths and relative camera in a
self-supervised manner without acquiring ground truth labels. More
surprisingly, they show that the well-trained networks enable scale-consistent
predictions over long videos, while the accuracy is still inferior to
traditional methods because of ignoring geometric information. Building on top
of recent progress in computer vision, we design a simple yet robust VO system
by integrating multi-view geometry and deep learning on Depth and optical Flow,
namely DF-VO. In this work, a) we propose a method to carefully sample
high-quality correspondences from deep flows and recover accurate camera poses
with a geometric module; b) we address the scale-drift issue by aligning
geometrically triangulated depths to the scale-consistent deep depths, where
the dynamic scenes are taken into account. Comprehensive ablation studies show
the effectiveness of the proposed method, and extensive evaluation results show
the state-of-the-art performance of our system, e.g., Ours (1.652%) v.s.
ORB-SLAM (3.247%}) in terms of translation error in KITTI Odometry benchmark.
Source code is publicly available at:
\href{https://github.com/Huangying-Zhan/DF-VO}{DF-VO}.
Related papers
- Robot Localization and Mapping Final Report -- Sequential Adversarial
Learning for Self-Supervised Deep Visual Odometry [2.512491726995032]
Visual odometry (VO) and SLAM have been using multi-view geometry via local structure from motion for decades.
Deep neural networks to extract high level features is ubiquitous in computer vision.
The goal of this work is to tackle these limitations of past approaches and to develop a method that can provide better depths and pose estimates.
arXiv Detail & Related papers (2023-09-08T06:24:17Z) - FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - Constraining Depth Map Geometry for Multi-View Stereo: A Dual-Depth
Approach with Saddle-shaped Depth Cells [23.345139129458122]
We show that different depth geometries have significant performance gaps, even using the same depth prediction error.
We introduce an ideal depth geometry composed of Saddle-Shaped Cells, whose predicted depth map oscillates upward and downward around the ground-truth surface.
Our method also points to a new research direction for considering depth geometry in MVS.
arXiv Detail & Related papers (2023-07-18T11:37:53Z) - Probabilistic and Geometric Depth: Detecting Objects in Perspective [78.00922683083776]
3D object detection is an important capability needed in various practical applications such as driver assistance systems.
Monocular 3D detection, as an economical solution compared to conventional settings relying on binocular vision or LiDAR, has drawn increasing attention recently but still yields unsatisfactory results.
This paper first presents a systematic study on this problem and observes that the current monocular 3D detection problem can be simplified as an instance depth estimation problem.
arXiv Detail & Related papers (2021-07-29T16:30:33Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust
Depth Prediction [87.08227378010874]
We show the importance of the high-order 3D geometric constraints for depth prediction.
By designing a loss term that enforces a simple geometric constraint, we significantly improve the accuracy and robustness of monocular depth estimation.
We show state-of-the-art results of learning metric depth on NYU Depth-V2 and KITTI.
arXiv Detail & Related papers (2021-03-07T00:08:21Z) - CodeVIO: Visual-Inertial Odometry with Learned Optimizable Dense Depth [83.77839773394106]
We present a lightweight, tightly-coupled deep depth network and visual-inertial odometry system.
We provide the network with previously marginalized sparse features from VIO to increase the accuracy of initial depth prediction.
We show that it can run in real-time with single-thread execution while utilizing GPU acceleration only for the network and code Jacobian.
arXiv Detail & Related papers (2020-12-18T09:42:54Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z) - Towards Better Generalization: Joint Depth-Pose Learning without PoseNet [36.414471128890284]
We tackle the essential problem of scale inconsistency for self-supervised joint depth-pose learning.
Most existing methods assume that a consistent scale of depth and pose can be learned across all input samples.
We propose a novel system that explicitly disentangles scale from the network estimation.
arXiv Detail & Related papers (2020-04-03T00:28:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.