H-Net: Unsupervised Attention-based Stereo Depth Estimation Leveraging
Epipolar Geometry
- URL: http://arxiv.org/abs/2104.11288v1
- Date: Thu, 22 Apr 2021 19:16:35 GMT
- Title: H-Net: Unsupervised Attention-based Stereo Depth Estimation Leveraging
Epipolar Geometry
- Authors: Baoru Huang, Jian-Qing Zheng, Stamatia Giannarou, Daniel S. Elson
- Abstract summary: We introduce the H-Net, a deep-learning framework for unsupervised stereo depth estimation.
For the first time, a Siamese autoencoder architecture is used for depth estimation.
Our method outperforms the state-ofthe-art unsupervised stereo depth estimation methods.
- Score: 4.968452390132676
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Depth estimation from a stereo image pair has become one of the most explored
applications in computer vision, with most of the previous methods relying on
fully supervised learning settings. However, due to the difficulty in acquiring
accurate and scalable ground truth data, the training of fully supervised
methods is challenging. As an alternative, self-supervised methods are becoming
more popular to mitigate this challenge. In this paper, we introduce the H-Net,
a deep-learning framework for unsupervised stereo depth estimation that
leverages epipolar geometry to refine stereo matching. For the first time, a
Siamese autoencoder architecture is used for depth estimation which allows
mutual information between the rectified stereo images to be extracted. To
enforce the epipolar constraint, the mutual epipolar attention mechanism has
been designed which gives more emphasis to correspondences of features which
lie on the same epipolar line while learning mutual information between the
input stereo pair. Stereo correspondences are further enhanced by incorporating
semantic information to the proposed attention mechanism. More specifically,
the optimal transport algorithm is used to suppress attention and eliminate
outliers in areas not visible in both cameras. Extensive experiments on
KITTI2015 and Cityscapes show that our method outperforms the state-ofthe-art
unsupervised stereo depth estimation methods while closing the gap with the
fully supervised approaches.
Related papers
- SDGE: Stereo Guided Depth Estimation for 360$^\circ$ Camera Sets [65.64958606221069]
Multi-camera systems are often used in autonomous driving to achieve a 360$circ$ perception.
These 360$circ$ camera sets often have limited or low-quality overlap regions, making multi-view stereo methods infeasible for the entire image.
We propose the Stereo Guided Depth Estimation (SGDE) method, which enhances depth estimation of the full image by explicitly utilizing multi-view stereo results on the overlap.
arXiv Detail & Related papers (2024-02-19T02:41:37Z) - Depth-aware Volume Attention for Texture-less Stereo Matching [67.46404479356896]
We propose a lightweight volume refinement scheme to tackle the texture deterioration in practical outdoor scenarios.
We introduce a depth volume supervised by the ground-truth depth map, capturing the relative hierarchy of image texture.
Local fine structure and context are emphasized to mitigate ambiguity and redundancy during volume aggregation.
arXiv Detail & Related papers (2024-02-14T04:07:44Z) - Stereo-Matching Knowledge Distilled Monocular Depth Estimation Filtered
by Multiple Disparity Consistency [31.261772846687297]
We propose a method to identify and filter errors in the pseudo-depth map using multiple disparity maps.
Experimental results show that the proposed method outperforms the previous methods.
arXiv Detail & Related papers (2024-01-22T15:05:05Z) - Self-Supervised Depth Estimation in Laparoscopic Image using 3D
Geometric Consistency [7.902636435901286]
We present M3Depth, a self-supervised depth estimator to leverage 3D geometric structural information hidden in stereo pairs.
Our method outperforms previous self-supervised approaches on both a public dataset and a newly acquired dataset by a large margin.
arXiv Detail & Related papers (2022-08-17T17:03:48Z) - Learning Monocular Depth Estimation via Selective Distillation of Stereo
Knowledge [34.380048111601894]
We design a decoder (MaskDecoder) that learns two binary masks which are trained to choose optimally between the proxy disparity maps and the estimated depth maps for each pixel.
The learned masks are then fed to another decoder (DepthDecoder) to enforce the estimated depths.
Experiments validate our methods achieve state-of-the-art performance for self- and proxy-supervised monocular depth estimation.
arXiv Detail & Related papers (2022-05-18T00:34:28Z) - On the Sins of Image Synthesis Loss for Self-supervised Depth Estimation [60.780823530087446]
We show that improvements in image synthesis do not necessitate improvement in depth estimation.
We attribute this diverging phenomenon to aleatoric uncertainties, which originate from data.
This observed divergence has not been previously reported or studied in depth.
arXiv Detail & Related papers (2021-09-13T17:57:24Z) - SMD-Nets: Stereo Mixture Density Networks [68.56947049719936]
We propose Stereo Mixture Density Networks (SMD-Nets), a simple yet effective learning framework compatible with a wide class of 2D and 3D architectures.
Specifically, we exploit bimodal mixture densities as output representation and show that this allows for sharp and precise disparity estimates near discontinuities.
We carry out comprehensive experiments on a new high-resolution and highly realistic synthetic stereo dataset, consisting of stereo pairs at 8Mpx resolution, as well as on real-world stereo datasets.
arXiv Detail & Related papers (2021-04-08T16:15:46Z) - On the confidence of stereo matching in a deep-learning era: a
quantitative evaluation [124.09613797008099]
We review more than ten years of developments in the field of confidence estimation for stereo matching.
We study the different behaviors of each measure when applied to a pool of different stereo algorithms and, for the first time in literature, when paired with a state-of-the-art deep stereo network.
arXiv Detail & Related papers (2021-01-02T11:40:17Z) - Reversing the cycle: self-supervised deep stereo through enhanced
monocular distillation [51.714092199995044]
In many fields, self-supervised learning solutions are rapidly evolving and filling the gap with supervised approaches.
We propose a novel self-supervised paradigm reversing the link between the two.
In order to train deep stereo networks, we distill knowledge through a monocular completion network.
arXiv Detail & Related papers (2020-08-17T07:40:22Z) - Self-supervised Monocular Trained Depth Estimation using Self-attention
and Discrete Disparity Volume [19.785343302320918]
We propose two new ideas to improve self-supervised monocular trained depth estimation: 1) self-attention, and 2) discrete disparity prediction.
We show that the extension of the state-of-the-art self-supervised monocular trained depth estimator Monodepth2 with these two ideas allows us to design a model that produces the best results in the field in KITTI 2015 and Make3D.
arXiv Detail & Related papers (2020-03-31T04:48:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.