Reversing the cycle: self-supervised deep stereo through enhanced
monocular distillation
- URL: http://arxiv.org/abs/2008.07130v1
- Date: Mon, 17 Aug 2020 07:40:22 GMT
- Title: Reversing the cycle: self-supervised deep stereo through enhanced
monocular distillation
- Authors: Filippo Aleotti, Fabio Tosi, Li Zhang, Matteo Poggi, Stefano Mattoccia
- Abstract summary: In many fields, self-supervised learning solutions are rapidly evolving and filling the gap with supervised approaches.
We propose a novel self-supervised paradigm reversing the link between the two.
In order to train deep stereo networks, we distill knowledge through a monocular completion network.
- Score: 51.714092199995044
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many fields, self-supervised learning solutions are rapidly evolving and
filling the gap with supervised approaches. This fact occurs for depth
estimation based on either monocular or stereo, with the latter often providing
a valid source of self-supervision for the former. In contrast, to soften
typical stereo artefacts, we propose a novel self-supervised paradigm reversing
the link between the two. Purposely, in order to train deep stereo networks, we
distill knowledge through a monocular completion network. This architecture
exploits single-image clues and few sparse points, sourced by traditional
stereo algorithms, to estimate dense yet accurate disparity maps by means of a
consensus mechanism over multiple estimations. We thoroughly evaluate with
popular stereo datasets the impact of different supervisory signals showing how
stereo networks trained with our paradigm outperform existing self-supervised
frameworks. Finally, our proposal achieves notable generalization capabilities
dealing with domain shift issues. Code available at
https://github.com/FilippoAleotti/Reversing
Related papers
- Learning Monocular Depth Estimation via Selective Distillation of Stereo
Knowledge [34.380048111601894]
We design a decoder (MaskDecoder) that learns two binary masks which are trained to choose optimally between the proxy disparity maps and the estimated depth maps for each pixel.
The learned masks are then fed to another decoder (DepthDecoder) to enforce the estimated depths.
Experiments validate our methods achieve state-of-the-art performance for self- and proxy-supervised monocular depth estimation.
arXiv Detail & Related papers (2022-05-18T00:34:28Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - Revisiting Domain Generalized Stereo Matching Networks from a Feature
Consistency Perspective [65.37571681370096]
We propose a simple pixel-wise contrastive learning across the viewpoints.
A stereo selective whitening loss is introduced to better preserve the stereo feature consistency across domains.
Our method achieves superior performance over several state-of-the-art networks.
arXiv Detail & Related papers (2022-03-21T11:21:41Z) - SelfTune: Metrically Scaled Monocular Depth Estimation through
Self-Supervised Learning [53.78813049373321]
We propose a self-supervised learning method for the pre-trained supervised monocular depth networks to enable metrically scaled depth estimation.
Our approach is useful for various applications such as mobile robot navigation and is applicable to diverse environments.
arXiv Detail & Related papers (2022-03-10T12:28:42Z) - Deep Multi-View Stereo gone wild [12.106051690920266]
Deep multi-view stereo (deep MVS) methods have been developed and extensively compared on simple datasets.
In this paper, we ask whether the conclusions reached in controlled scenarios are still valid when working with Internet photo collections.
We propose a methodology for evaluation and explore the influence of three aspects of deep MVS methods: network architecture, training data, and supervision.
arXiv Detail & Related papers (2021-04-30T17:07:17Z) - The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth [28.06671063873351]
ManyDepth is an adaptive approach to dense depth estimation.
We present a novel consistency loss that encourages the network to ignore the cost volume when it is deemed unreliable.
arXiv Detail & Related papers (2021-04-29T17:53:42Z) - H-Net: Unsupervised Attention-based Stereo Depth Estimation Leveraging
Epipolar Geometry [4.968452390132676]
We introduce the H-Net, a deep-learning framework for unsupervised stereo depth estimation.
For the first time, a Siamese autoencoder architecture is used for depth estimation.
Our method outperforms the state-ofthe-art unsupervised stereo depth estimation methods.
arXiv Detail & Related papers (2021-04-22T19:16:35Z) - Self-supervised Learning of Depth Inference for Multi-view Stereo [36.320984882009775]
We propose a self-supervised learning framework for multi-view stereo networks.
We start by learning to estimate depth maps as initial pseudo labels under an unsupervised learning framework.
We refine the initial pseudo labels using a carefully designed pipeline.
arXiv Detail & Related papers (2021-04-07T07:45:02Z) - On the confidence of stereo matching in a deep-learning era: a
quantitative evaluation [124.09613797008099]
We review more than ten years of developments in the field of confidence estimation for stereo matching.
We study the different behaviors of each measure when applied to a pool of different stereo algorithms and, for the first time in literature, when paired with a state-of-the-art deep stereo network.
arXiv Detail & Related papers (2021-01-02T11:40:17Z) - Don't Forget The Past: Recurrent Depth Estimation from Monocular Video [92.84498980104424]
We put three different types of depth estimation into a common framework.
Our method produces a time series of depth maps.
It can be applied to monocular videos only or be combined with different types of sparse depth patterns.
arXiv Detail & Related papers (2020-01-08T16:50:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.