Learning Monocular Depth Estimation via Selective Distillation of Stereo
Knowledge
- URL: http://arxiv.org/abs/2205.08668v1
- Date: Wed, 18 May 2022 00:34:28 GMT
- Title: Learning Monocular Depth Estimation via Selective Distillation of Stereo
Knowledge
- Authors: Kyeongseob Song and Kuk-Jin Yoon
- Abstract summary: We design a decoder (MaskDecoder) that learns two binary masks which are trained to choose optimally between the proxy disparity maps and the estimated depth maps for each pixel.
The learned masks are then fed to another decoder (DepthDecoder) to enforce the estimated depths.
Experiments validate our methods achieve state-of-the-art performance for self- and proxy-supervised monocular depth estimation.
- Score: 34.380048111601894
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular depth estimation has been extensively explored based on deep
learning, yet its accuracy and generalization ability still lag far behind the
stereo-based methods. To tackle this, a few recent studies have proposed to
supervise the monocular depth estimation network by distilling disparity maps
as proxy ground-truths. However, these studies naively distill the stereo
knowledge without considering the comparative advantages of stereo-based and
monocular depth estimation methods. In this paper, we propose to selectively
distill the disparity maps for more reliable proxy supervision. Specifically,
we first design a decoder (MaskDecoder) that learns two binary masks which are
trained to choose optimally between the proxy disparity maps and the estimated
depth maps for each pixel. The learned masks are then fed to another decoder
(DepthDecoder) to enforce the estimated depths to learn from only the masked
area in the proxy disparity maps. Additionally, a Teacher-Student module is
designed to transfer the geometric knowledge of the StereoNet to the MonoNet.
Extensive experiments validate our methods achieve state-of-the-art performance
for self- and proxy-supervised monocular depth estimation on the KITTI dataset,
even surpassing some of the semi-supervised methods.
Related papers
- Stereo-Matching Knowledge Distilled Monocular Depth Estimation Filtered
by Multiple Disparity Consistency [31.261772846687297]
We propose a method to identify and filter errors in the pseudo-depth map using multiple disparity maps.
Experimental results show that the proposed method outperforms the previous methods.
arXiv Detail & Related papers (2024-01-22T15:05:05Z) - SwinDepth: Unsupervised Depth Estimation using Monocular Sequences via
Swin Transformer and Densely Cascaded Network [29.798579906253696]
It is challenging to acquire dense ground truth depth labels for supervised training, and the unsupervised depth estimation using monocular sequences emerges as a promising alternative.
In this paper, we employ a convolution-free Swin Transformer as an image feature extractor so that the network can capture both local geometric features and global semantic features for depth estimation.
Also, we propose a Densely Cascaded Multi-scale Network (DCMNet) that connects every feature map directly with another from different scales via a top-down cascade pathway.
arXiv Detail & Related papers (2023-01-17T06:01:46Z) - SelfTune: Metrically Scaled Monocular Depth Estimation through
Self-Supervised Learning [53.78813049373321]
We propose a self-supervised learning method for the pre-trained supervised monocular depth networks to enable metrically scaled depth estimation.
Our approach is useful for various applications such as mobile robot navigation and is applicable to diverse environments.
arXiv Detail & Related papers (2022-03-10T12:28:42Z) - 360 Depth Estimation in the Wild -- The Depth360 Dataset and the SegFuse
Network [35.03201732370496]
Single-view depth estimation from omnidirectional images has gained popularity with its wide range of applications such as autonomous driving and scene reconstruction.
In this work, we first establish a large-scale dataset with varied settings called Depth360 to tackle the training data problem.
We then propose an end-to-end two-branch multi-task learning network, SegFuse, that mimics the human eye to effectively learn from the dataset.
arXiv Detail & Related papers (2022-02-16T11:56:31Z) - X-Distill: Improving Self-Supervised Monocular Depth via Cross-Task
Distillation [69.9604394044652]
We propose a novel method to improve the self-supervised training of monocular depth via cross-task knowledge distillation.
During training, we utilize a pretrained semantic segmentation teacher network and transfer its semantic knowledge to the depth network.
We extensively evaluate the efficacy of our proposed approach on the KITTI benchmark and compare it with the latest state of the art.
arXiv Detail & Related papers (2021-10-24T19:47:14Z) - Pseudo Supervised Monocular Depth Estimation with Teacher-Student
Network [90.20878165546361]
We propose a new unsupervised depth estimation method based on pseudo supervision mechanism.
It strategically integrates the advantages of supervised and unsupervised monocular depth estimation.
Our experimental results demonstrate that the proposed method outperforms the state-of-the-art on the KITTI benchmark.
arXiv Detail & Related papers (2021-10-22T01:08:36Z) - H-Net: Unsupervised Attention-based Stereo Depth Estimation Leveraging
Epipolar Geometry [4.968452390132676]
We introduce the H-Net, a deep-learning framework for unsupervised stereo depth estimation.
For the first time, a Siamese autoencoder architecture is used for depth estimation.
Our method outperforms the state-ofthe-art unsupervised stereo depth estimation methods.
arXiv Detail & Related papers (2021-04-22T19:16:35Z) - Adaptive confidence thresholding for monocular depth estimation [83.06265443599521]
We propose a new approach to leverage pseudo ground truth depth maps of stereo images generated from self-supervised stereo matching methods.
The confidence map of the pseudo ground truth depth map is estimated to mitigate performance degeneration by inaccurate pseudo depth maps.
Experimental results demonstrate superior performance to state-of-the-art monocular depth estimation methods.
arXiv Detail & Related papers (2020-09-27T13:26:16Z) - Reversing the cycle: self-supervised deep stereo through enhanced
monocular distillation [51.714092199995044]
In many fields, self-supervised learning solutions are rapidly evolving and filling the gap with supervised approaches.
We propose a novel self-supervised paradigm reversing the link between the two.
In order to train deep stereo networks, we distill knowledge through a monocular completion network.
arXiv Detail & Related papers (2020-08-17T07:40:22Z) - Single Image Depth Estimation Trained via Depth from Defocus Cues [105.67073923825842]
Estimating depth from a single RGB image is a fundamental task in computer vision.
In this work, we rely, instead of different views, on depth from focus cues.
We present results that are on par with supervised methods on KITTI and Make3D datasets and outperform unsupervised learning approaches.
arXiv Detail & Related papers (2020-01-14T20:22:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.