Stereo Matching by Self-supervision of Multiscopic Vision
- URL: http://arxiv.org/abs/2104.04170v1
- Date: Fri, 9 Apr 2021 02:58:59 GMT
- Title: Stereo Matching by Self-supervision of Multiscopic Vision
- Authors: Weihao Yuan, Yazhan Zhang, Bingkun Wu, Siyu Zhu, Ping Tan, Michael Yu
Wang, Qifeng Chen
- Abstract summary: We propose a new self-supervised framework for stereo matching utilizing multiple images captured at aligned camera positions.
A cross photometric loss, an uncertainty-aware mutual-supervision loss, and a new smoothness loss are introduced to optimize the network.
Our model obtains better disparity maps than previous unsupervised methods on the KITTI dataset.
- Score: 65.38359887232025
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised learning for depth estimation possesses several advantages
over supervised learning. The benefits of no need for ground-truth depth,
online fine-tuning, and better generalization with unlimited data attract
researchers to seek self-supervised solutions. In this work, we propose a new
self-supervised framework for stereo matching utilizing multiple images
captured at aligned camera positions. A cross photometric loss, an
uncertainty-aware mutual-supervision loss, and a new smoothness loss are
introduced to optimize the network in learning disparity maps end-to-end
without ground-truth depth information. To train this framework, we build a new
multiscopic dataset consisting of synthetic images rendered by 3D engines and
real images captured by real cameras. After being trained with only the
synthetic images, our network can perform well in unseen outdoor scenes. Our
experiment shows that our model obtains better disparity maps than previous
unsupervised methods on the KITTI dataset and is comparable to supervised
methods when generalized to unseen data. Our source code and dataset will be
made public, and more results are provided in the supplement.
Related papers
- Sparse Multi-baseline SAR Cross-modal 3D Reconstruction of Vehicle Targets [5.6680936716261705]
We propose a Cross-Modal Reconstruction Network (CMR-Net), which integrates differentiable render and cross-modal supervision with optical images.
CMR-Net, trained solely on simulated data, demonstrates high-resolution reconstruction capabilities on both publicly available simulation datasets and real measured datasets.
arXiv Detail & Related papers (2024-06-06T15:18:59Z) - Deep Domain Adaptation: A Sim2Real Neural Approach for Improving Eye-Tracking Systems [80.62854148838359]
Eye image segmentation is a critical step in eye tracking that has great influence over the final gaze estimate.
We use dimensionality-reduction techniques to measure the overlap between the target eye images and synthetic training data.
Our methods result in robust, improved performance when tackling the discrepancy between simulation and real-world data samples.
arXiv Detail & Related papers (2024-03-23T22:32:06Z) - An evaluation of Deep Learning based stereo dense matching dataset shift
from aerial images and a large scale stereo dataset [2.048226951354646]
We present a method for generating ground-truth disparity maps directly from Light Detection and Ranging (LiDAR) and images.
We evaluate 11 dense matching methods across datasets with diverse scene types, image resolutions, and geometric configurations.
arXiv Detail & Related papers (2024-02-19T20:33:46Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - Learning Collision-Free Space Detection from Stereo Images: Homography
Matrix Brings Better Data Augmentation [16.99302954185652]
It remains an open challenge to train deep convolutional neural networks (DCNNs) using only a small quantity of training samples.
This paper explores an effective training data augmentation approach that can be employed to improve the overall DCNN performance.
arXiv Detail & Related papers (2020-12-14T19:14:35Z) - Improving Deep Stereo Network Generalization with Geometric Priors [93.09496073476275]
Large datasets of diverse real-world scenes with dense ground truth are difficult to obtain.
Many algorithms rely on small real-world datasets of similar scenes or synthetic datasets.
We propose to incorporate prior knowledge of scene geometry into an end-to-end stereo network to help networks generalize better.
arXiv Detail & Related papers (2020-08-25T15:24:02Z) - From Image Collections to Point Clouds with Self-supervised Shape and
Pose Networks [53.71440550507745]
Reconstructing 3D models from 2D images is one of the fundamental problems in computer vision.
We propose a deep learning technique for 3D object reconstruction from a single image.
We learn both 3D point cloud reconstruction and pose estimation networks in a self-supervised manner.
arXiv Detail & Related papers (2020-05-05T04:25:16Z) - Deep 3D Capture: Geometry and Reflectance from Sparse Multi-View Images [59.906948203578544]
We introduce a novel learning-based method to reconstruct the high-quality geometry and complex, spatially-varying BRDF of an arbitrary object.
We first estimate per-view depth maps using a deep multi-view stereo network.
These depth maps are used to coarsely align the different views.
We propose a novel multi-view reflectance estimation network architecture.
arXiv Detail & Related papers (2020-03-27T21:28:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.