Learned Monocular Depth Priors in Visual-Inertial Initialization
- URL: http://arxiv.org/abs/2204.09171v1
- Date: Wed, 20 Apr 2022 00:30:04 GMT
- Title: Learned Monocular Depth Priors in Visual-Inertial Initialization
- Authors: Yunwen Zhou, Abhishek Kar, Eric Turner, Adarsh Kowdle, Chao X. Guo,
Ryan C. DuToit, Konstantine Tsotsos
- Abstract summary: Visual-inertial odometry (VIO) is the pose estimation backbone for most AR/VR and autonomous robotic systems today.
We propose to circumvent the limitations of classical visual-inertial structure-from-motion (SfM)
We leverage learned monocular depth images (mono-depth) to constrain the relative depth of features, and upgrade the mono-depth to metric scale by jointly optimizing for its scale and shift.
- Score: 4.99761983273316
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual-inertial odometry (VIO) is the pose estimation backbone for most AR/VR
and autonomous robotic systems today, in both academia and industry. However,
these systems are highly sensitive to the initialization of key parameters such
as sensor biases, gravity direction, and metric scale. In practical scenarios
where high-parallax or variable acceleration assumptions are rarely met (e.g.
hovering aerial robot, smartphone AR user not gesticulating with phone),
classical visual-inertial initialization formulations often become
ill-conditioned and/or fail to meaningfully converge. In this paper we target
visual-inertial initialization specifically for these low-excitation scenarios
critical to in-the-wild usage. We propose to circumvent the limitations of
classical visual-inertial structure-from-motion (SfM) initialization by
incorporating a new learning-based measurement as a higher-level input. We
leverage learned monocular depth images (mono-depth) to constrain the relative
depth of features, and upgrade the mono-depth to metric scale by jointly
optimizing for its scale and shift. Our experiments show a significant
improvement in problem conditioning compared to a classical formulation for
visual-inertial initialization, and demonstrate significant accuracy and
robustness improvements relative to the state-of-the-art on public benchmarks,
particularly under motion-restricted scenarios. We further extend this
improvement to implementation within an existing odometry system to illustrate
the impact of our improved initialization method on resulting tracking
trajectories.
Related papers
- Self-Supervised Geometry-Guided Initialization for Robust Monocular Visual Odometry [9.79428015716139]
In this paper, we analyze major failure cases on outdoor benchmarks and expose shortcomings of a learning-based SLAM model (DROID-SLAM)
We propose the use of self-supervised priors leveraging a frozen large-scale pre-trained monocular depth estimation to initialize the dense bundle adjustment process.
Despite its simplicity, our proposed method demonstrates significant improvements on KITTI odometry, as well as the challenging DDAD benchmark.
arXiv Detail & Related papers (2024-06-03T01:59:29Z) - InfRS: Incremental Few-Shot Object Detection in Remote Sensing Images [11.916941756499435]
In this paper, we explore the intricate task of incremental few-shot object detection in remote sensing images.
We introduce a pioneering fine-tuning-based technique, termed InfRS, designed to facilitate the incremental learning of novel classes.
We develop a prototypical calibration strategy based on the Wasserstein distance to mitigate the catastrophic forgetting problem.
arXiv Detail & Related papers (2024-05-18T13:39:50Z) - Efficient Prompt Tuning of Large Vision-Language Model for Fine-Grained
Ship Classification [62.425462136772666]
Fine-grained ship classification in remote sensing (RS-FGSC) poses a significant challenge due to the high similarity between classes and the limited availability of labeled data.
Recent advancements in large pre-trained Vision-Language Models (VLMs) have demonstrated impressive capabilities in few-shot or zero-shot learning.
This study delves into harnessing the potential of VLMs to enhance classification accuracy for unseen ship categories.
arXiv Detail & Related papers (2024-03-13T05:48:58Z) - Towards Theoretically Inspired Neural Initialization Optimization [66.04735385415427]
We propose a differentiable quantity, named GradCosine, with theoretical insights to evaluate the initial state of a neural network.
We show that both the training and test performance of a network can be improved by maximizing GradCosine under norm constraint.
Generalized from the sample-wise analysis into the real batch setting, NIO is able to automatically look for a better initialization with negligible cost.
arXiv Detail & Related papers (2022-10-12T06:49:16Z) - Towards Scale-Aware, Robust, and Generalizable Unsupervised Monocular
Depth Estimation by Integrating IMU Motion Dynamics [74.1720528573331]
Unsupervised monocular depth and ego-motion estimation has drawn extensive research attention in recent years.
We propose DynaDepth, a novel scale-aware framework that integrates information from vision and IMU motion dynamics.
We validate the effectiveness of DynaDepth by conducting extensive experiments and simulations on the KITTI and Make3D datasets.
arXiv Detail & Related papers (2022-07-11T07:50:22Z) - Benchmarking Detection Transfer Learning with Vision Transformers [60.97703494764904]
complexity of object detection methods can make benchmarking non-trivial when new architectures, such as Vision Transformer (ViT) models, arrive.
We present training techniques that overcome these challenges, enabling the use of standard ViT models as the backbone of Mask R-CNN.
Our results show that recent masking-based unsupervised learning methods may, for the first time, provide convincing transfer learning improvements on COCO.
arXiv Detail & Related papers (2021-11-22T18:59:15Z) - Robust Visual Odometry Using Position-Aware Flow and Geometric Bundle
Adjustment [16.04240592057438]
A novel optical flow network (PANet) built on a position-aware mechanism is proposed first.
Then, a novel system that jointly estimates depth, optical flow, and ego-motion without a typical network to learning ego-motion is proposed.
Experiments show that the proposed system not only outperforms other state-of-the-art methods in terms of depth, flow, and VO estimation.
arXiv Detail & Related papers (2021-11-22T12:05:27Z) - Which priors matter? Benchmarking models for learning latent dynamics [70.88999063639146]
Several methods have proposed to integrate priors from classical mechanics into machine learning models.
We take a sober look at the current capabilities of these models.
We find that the use of continuous and time-reversible dynamics benefits models of all classes.
arXiv Detail & Related papers (2021-11-09T23:48:21Z) - Multi-Object Tracking with Deep Learning Ensemble for Unmanned Aerial
System Applications [0.0]
Multi-object tracking (MOT) is a crucial component of situational awareness in military defense applications.
We present a robust object tracking architecture aimed to accommodate for the noise in real-time situations.
We propose a kinematic prediction model, called Deep Extended Kalman Filter (DeepEKF), in which a sequence-to-sequence architecture is used to predict entity trajectories in latent space.
arXiv Detail & Related papers (2021-10-05T13:50:38Z) - Pushing the Envelope of Rotation Averaging for Visual SLAM [69.7375052440794]
We propose a novel optimization backbone for visual SLAM systems.
We leverage averaging to improve the accuracy, efficiency and robustness of conventional monocular SLAM systems.
Our approach can exhibit up to 10x faster with comparable accuracy against the state-art on public benchmarks.
arXiv Detail & Related papers (2020-11-02T18:02:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.