Improving Deep Stereo Network Generalization with Geometric Priors
- URL: http://arxiv.org/abs/2008.11098v1
- Date: Tue, 25 Aug 2020 15:24:02 GMT
- Title: Improving Deep Stereo Network Generalization with Geometric Priors
- Authors: Jialiang Wang, Varun Jampani, Deqing Sun, Charles Loop, Stan
Birchfield, Jan Kautz
- Abstract summary: Large datasets of diverse real-world scenes with dense ground truth are difficult to obtain.
Many algorithms rely on small real-world datasets of similar scenes or synthetic datasets.
We propose to incorporate prior knowledge of scene geometry into an end-to-end stereo network to help networks generalize better.
- Score: 93.09496073476275
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: End-to-end deep learning methods have advanced stereo vision in recent years
and obtained excellent results when the training and test data are similar.
However, large datasets of diverse real-world scenes with dense ground truth
are difficult to obtain and currently not publicly available to the research
community. As a result, many algorithms rely on small real-world datasets of
similar scenes or synthetic datasets, but end-to-end algorithms trained on such
datasets often generalize poorly to different images that arise in real-world
applications. As a step towards addressing this problem, we propose to
incorporate prior knowledge of scene geometry into an end-to-end stereo network
to help networks generalize better. For a given network, we explicitly add a
gradient-domain smoothness prior and occlusion reasoning into the network
training, while the architecture remains unchanged during inference.
Experimentally, we show consistent improvements if we train on synthetic
datasets and test on the Middlebury (real images) dataset. Noticeably, we
improve PSM-Net accuracy on Middlebury from 5.37 MAE to 3.21 MAE without
sacrificing speed.
Related papers
- Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - A New Benchmark: On the Utility of Synthetic Data with Blender for Bare
Supervised Learning and Downstream Domain Adaptation [42.2398858786125]
Deep learning in computer vision has achieved great success with the price of large-scale labeled training data.
The uncontrollable data collection process produces non-IID training and test data, where undesired duplication may exist.
To circumvent them, an alternative is to generate synthetic data via 3D rendering with domain randomization.
arXiv Detail & Related papers (2023-03-16T09:03:52Z) - GraphCSPN: Geometry-Aware Depth Completion via Dynamic GCNs [49.55919802779889]
We propose a Graph Convolution based Spatial Propagation Network (GraphCSPN) as a general approach for depth completion.
In this work, we leverage convolution neural networks as well as graph neural networks in a complementary way for geometric representation learning.
Our method achieves the state-of-the-art performance, especially when compared in the case of using only a few propagation steps.
arXiv Detail & Related papers (2022-10-19T17:56:03Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - Domain Adaptation for Real-World Single View 3D Reconstruction [1.611271868398988]
unsupervised domain adaptation can be used to transfer knowledge from the labeled synthetic source domain to the unlabeled real target domain.
We propose a novel architecture which takes advantage of the fact that in this setting, target domain data is unsupervised with regards to the 3D model but supervised for class labels.
Results are performed with ShapeNet as the source domain and domains within the Object Domain Suite (ODDS) dataset as the target.
arXiv Detail & Related papers (2021-08-24T22:02:27Z) - Stereo Matching by Self-supervision of Multiscopic Vision [65.38359887232025]
We propose a new self-supervised framework for stereo matching utilizing multiple images captured at aligned camera positions.
A cross photometric loss, an uncertainty-aware mutual-supervision loss, and a new smoothness loss are introduced to optimize the network.
Our model obtains better disparity maps than previous unsupervised methods on the KITTI dataset.
arXiv Detail & Related papers (2021-04-09T02:58:59Z) - SMD-Nets: Stereo Mixture Density Networks [68.56947049719936]
We propose Stereo Mixture Density Networks (SMD-Nets), a simple yet effective learning framework compatible with a wide class of 2D and 3D architectures.
Specifically, we exploit bimodal mixture densities as output representation and show that this allows for sharp and precise disparity estimates near discontinuities.
We carry out comprehensive experiments on a new high-resolution and highly realistic synthetic stereo dataset, consisting of stereo pairs at 8Mpx resolution, as well as on real-world stereo datasets.
arXiv Detail & Related papers (2021-04-08T16:15:46Z) - DmifNet:3D Shape Reconstruction Based on Dynamic Multi-Branch
Information Fusion [14.585272577456472]
3D object reconstruction from a single-view image is a long-standing challenging problem.
Previous work was difficult to accurately reconstruct 3D shapes with a complex topology which has rich details at the edges and corners.
We propose a Dynamic Multi-branch Information Fusion Network (DmifNet) which can recover a high-fidelity 3D shape of arbitrary topology from a 2D image.
arXiv Detail & Related papers (2020-11-21T11:31:27Z) - Learning Stereo from Single Images [41.32821954097483]
Supervised deep networks are among the best methods for finding correspondences in stereo image pairs.
We propose that it is unnecessary to have such a high reliance on ground truth depths or even corresponding stereo pairs.
Inspired by recent progress in monocular depth estimation, we generate plausible disparity maps from single images. In turn, we use those flawed disparity maps in a carefully designed pipeline to generate stereo training pairs.
arXiv Detail & Related papers (2020-08-04T12:22:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.