Learning Stereo from Single Images
- URL: http://arxiv.org/abs/2008.01484v2
- Date: Thu, 20 Aug 2020 18:11:25 GMT
- Title: Learning Stereo from Single Images
- Authors: Jamie Watson, Oisin Mac Aodha, Daniyar Turmukhambetov, Gabriel J.
Brostow, Michael Firman
- Abstract summary: Supervised deep networks are among the best methods for finding correspondences in stereo image pairs.
We propose that it is unnecessary to have such a high reliance on ground truth depths or even corresponding stereo pairs.
Inspired by recent progress in monocular depth estimation, we generate plausible disparity maps from single images. In turn, we use those flawed disparity maps in a carefully designed pipeline to generate stereo training pairs.
- Score: 41.32821954097483
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Supervised deep networks are among the best methods for finding
correspondences in stereo image pairs. Like all supervised approaches, these
networks require ground truth data during training. However, collecting large
quantities of accurate dense correspondence data is very challenging. We
propose that it is unnecessary to have such a high reliance on ground truth
depths or even corresponding stereo pairs. Inspired by recent progress in
monocular depth estimation, we generate plausible disparity maps from single
images. In turn, we use those flawed disparity maps in a carefully designed
pipeline to generate stereo training pairs. Training in this manner makes it
possible to convert any collection of single RGB images into stereo training
data. This results in a significant reduction in human effort, with no need to
collect real depths or to hand-design synthetic data. We can consequently train
a stereo matching network from scratch on datasets like COCO, which were
previously hard to exploit for stereo. Through extensive experiments we show
that our approach outperforms stereo networks trained with standard synthetic
datasets, when evaluated on KITTI, ETH3D, and Middlebury.
Related papers
- An evaluation of Deep Learning based stereo dense matching dataset shift
from aerial images and a large scale stereo dataset [2.048226951354646]
We present a method for generating ground-truth disparity maps directly from Light Detection and Ranging (LiDAR) and images.
We evaluate 11 dense matching methods across datasets with diverse scene types, image resolutions, and geometric configurations.
arXiv Detail & Related papers (2024-02-19T20:33:46Z) - NeRF-Supervised Deep Stereo [33.54504171850584]
We introduce a novel framework for training deep stereo networks effortlessly and without any ground-truth.
By leveraging state-of-the-art neural rendering solutions, we generate stereo training data from image sequences collected with a single handheld camera.
On top of them, a NeRF-supervised training procedure is carried out, from which we exploit rendered stereo triplets to compensate for occlusions and depth maps as proxy labels.
arXiv Detail & Related papers (2023-03-30T17:59:58Z) - UAVStereo: A Multiple Resolution Dataset for Stereo Matching in UAV
Scenarios [0.6524460254566905]
This paper constructs a multi-resolution UAV scenario dataset, called UAVStereo, with over 34k stereo image pairs covering 3 typical scenes.
In this paper, we evaluate traditional and state-of-the-art deep learning methods, highlighting their limitations in addressing challenges in UAV scenarios.
arXiv Detail & Related papers (2023-02-20T16:45:27Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - TriStereoNet: A Trinocular Framework for Multi-baseline Disparity
Estimation [18.690105889241828]
We present an end-to-end network for processing the data from a trinocular setup.
In this design, two pairs of binocular data with a common reference image are treated with shared weights of the network.
We also propose a Guided Addition method for merging the 4D data of the two baselines.
arXiv Detail & Related papers (2021-11-24T13:58:17Z) - Self-Supervised Depth Completion for Active Stereo [55.79929735390945]
Active stereo systems are widely used in the robotics industry due to their low cost and high quality depth maps.
These depth sensors suffer from stereo artefacts and do not provide dense depth estimates.
We present the first self-supervised depth completion method for active stereo systems that predicts accurate dense depth maps.
arXiv Detail & Related papers (2021-10-07T07:33:52Z) - Stereo Matching by Self-supervision of Multiscopic Vision [65.38359887232025]
We propose a new self-supervised framework for stereo matching utilizing multiple images captured at aligned camera positions.
A cross photometric loss, an uncertainty-aware mutual-supervision loss, and a new smoothness loss are introduced to optimize the network.
Our model obtains better disparity maps than previous unsupervised methods on the KITTI dataset.
arXiv Detail & Related papers (2021-04-09T02:58:59Z) - SMD-Nets: Stereo Mixture Density Networks [68.56947049719936]
We propose Stereo Mixture Density Networks (SMD-Nets), a simple yet effective learning framework compatible with a wide class of 2D and 3D architectures.
Specifically, we exploit bimodal mixture densities as output representation and show that this allows for sharp and precise disparity estimates near discontinuities.
We carry out comprehensive experiments on a new high-resolution and highly realistic synthetic stereo dataset, consisting of stereo pairs at 8Mpx resolution, as well as on real-world stereo datasets.
arXiv Detail & Related papers (2021-04-08T16:15:46Z) - Improving Deep Stereo Network Generalization with Geometric Priors [93.09496073476275]
Large datasets of diverse real-world scenes with dense ground truth are difficult to obtain.
Many algorithms rely on small real-world datasets of similar scenes or synthetic datasets.
We propose to incorporate prior knowledge of scene geometry into an end-to-end stereo network to help networks generalize better.
arXiv Detail & Related papers (2020-08-25T15:24:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.