Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World
- URL: http://arxiv.org/abs/2203.05712v1
- Date: Fri, 11 Mar 2022 01:51:54 GMT
- Title: Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World
- Authors: Sen Zhang, Jing Zhang, Dacheng Tao
- Abstract summary: We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
- Score: 83.36195426897768
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular visual odometry (VO) has attracted extensive research attention by
providing real-time vehicle motion from cost-effective camera images. However,
state-of-the-art optimization-based monocular VO methods suffer from the scale
inconsistency problem for long-term predictions. Deep learning has recently
been introduced to address this issue by leveraging stereo sequences or
ground-truth motions in the training dataset. However, it comes at an
additional cost for data collection, and such training data may not be
available in all datasets. In this work, we propose VRVO, a novel framework for
retrieving the absolute scale from virtual data that can be easily obtained
from modern simulation environments, whereas in the real domain no stereo or
ground-truth data are required in either the training or inference phases.
Specifically, we first train a scale-aware disparity network using both
monocular real images and stereo virtual data. The virtual-to-real domain gap
is bridged by using an adversarial training strategy to map images from both
domains into a shared feature space. The resulting scale-consistent disparities
are then integrated with a direct VO system by constructing a virtual stereo
objective that ensures the scale consistency over long trajectories.
Additionally, to address the suboptimality issue caused by the separate
optimization backend and the learning process, we further propose a mutual
reinforcement pipeline that allows bidirectional information flow between
learning and optimization, which boosts the robustness and accuracy of each
other. We demonstrate the effectiveness of our framework on the KITTI and
vKITTI2 datasets.
Related papers
- Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - Deep Domain Adaptation: A Sim2Real Neural Approach for Improving Eye-Tracking Systems [80.62854148838359]
Eye image segmentation is a critical step in eye tracking that has great influence over the final gaze estimate.
We use dimensionality-reduction techniques to measure the overlap between the target eye images and synthetic training data.
Our methods result in robust, improved performance when tackling the discrepancy between simulation and real-world data samples.
arXiv Detail & Related papers (2024-03-23T22:32:06Z) - Augmented Reality based Simulated Data (ARSim) with multi-view consistency for AV perception networks [47.07188762367792]
We present ARSim, a framework designed to enhance real multi-view image data with 3D synthetic objects of interest.
We construct a simplified virtual scene using real data and strategically place 3D synthetic assets within it.
The resulting augmented multi-view consistent dataset is used to train a multi-camera perception network for autonomous vehicles.
arXiv Detail & Related papers (2024-03-22T17:49:11Z) - Federated Multi-View Synthesizing for Metaverse [52.59476179535153]
The metaverse is expected to provide immersive entertainment, education, and business applications.
Virtual reality (VR) transmission over wireless networks is data- and computation-intensive.
We have developed a novel multi-view synthesizing framework that can efficiently provide synthesizing, storage, and communication resources for wireless content delivery in the metaverse.
arXiv Detail & Related papers (2023-12-18T13:51:56Z) - XVO: Generalized Visual Odometry via Cross-Modal Self-Training [11.70220331540621]
XVO is a semi-supervised learning method for training generalized monocular Visual Odometry (VO) models.
In contrast to standard monocular VO approaches which often study a known calibration within a single dataset, XVO efficiently learns to recover relative pose with real-world scale.
We optimize the motion estimation model via self-training from large amounts of unconstrained and heterogeneous dash camera videos available on YouTube.
arXiv Detail & Related papers (2023-09-28T18:09:40Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - Progressive Transformation Learning for Leveraging Virtual Images in
Training [21.590496842692744]
We introduce Progressive Transformation Learning (PTL) to augment a training dataset by adding transformed virtual images with enhanced realism.
PTL takes a novel approach that progressively iterates the following three steps: 1) select a subset from a pool of virtual images according to the domain gap, 2) transform the selected virtual images to enhance realism, and 3) add the transformed virtual images to the training set while removing them from the pool.
Experiments show that PTL results in a substantial performance increase over the baseline, especially in the small data and the cross-domain regime.
arXiv Detail & Related papers (2022-11-03T13:04:15Z) - Learning Collision-Free Space Detection from Stereo Images: Homography
Matrix Brings Better Data Augmentation [16.99302954185652]
It remains an open challenge to train deep convolutional neural networks (DCNNs) using only a small quantity of training samples.
This paper explores an effective training data augmentation approach that can be employed to improve the overall DCNN performance.
arXiv Detail & Related papers (2020-12-14T19:14:35Z) - Deflating Dataset Bias Using Synthetic Data Augmentation [8.509201763744246]
State-of-the-art methods for most vision tasks for Autonomous Vehicles (AVs) rely on supervised learning.
The goal of this paper is to investigate the use of targeted synthetic data augmentation for filling gaps in real datasets for vision tasks.
Empirical studies on three different computer vision tasks of practical use to AVs consistently show that having synthetic data in the training mix provides a significant boost in cross-dataset generalization performance.
arXiv Detail & Related papers (2020-04-28T21:56:10Z) - Two-shot Spatially-varying BRDF and Shape Estimation [89.29020624201708]
We propose a novel deep learning architecture with a stage-wise estimation of shape and SVBRDF.
We create a large-scale synthetic training dataset with domain-randomized geometry and realistic materials.
Experiments on both synthetic and real-world datasets show that our network trained on a synthetic dataset can generalize well to real-world images.
arXiv Detail & Related papers (2020-04-01T12:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.