Self-supervised Learning of Depth Inference for Multi-view Stereo
- URL: http://arxiv.org/abs/2104.02972v1
- Date: Wed, 7 Apr 2021 07:45:02 GMT
- Title: Self-supervised Learning of Depth Inference for Multi-view Stereo
- Authors: Jiayu Yang, Jose M. Alvarez, Miaomiao Liu
- Abstract summary: We propose a self-supervised learning framework for multi-view stereo networks.
We start by learning to estimate depth maps as initial pseudo labels under an unsupervised learning framework.
We refine the initial pseudo labels using a carefully designed pipeline.
- Score: 36.320984882009775
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent supervised multi-view depth estimation networks have achieved
promising results. Similar to all supervised approaches, these networks require
ground-truth data during training. However, collecting a large amount of
multi-view depth data is very challenging. Here, we propose a self-supervised
learning framework for multi-view stereo that exploit pseudo labels from the
input data. We start by learning to estimate depth maps as initial pseudo
labels under an unsupervised learning framework relying on image reconstruction
loss as supervision. We then refine the initial pseudo labels using a carefully
designed pipeline leveraging depth information inferred from higher resolution
images and neighboring views. We use these high-quality pseudo labels as the
supervision signal to train the network and improve, iteratively, its
performance by self-training. Extensive experiments on the DTU dataset show
that our proposed self-supervised learning framework outperforms existing
unsupervised multi-view stereo networks by a large margin and performs on par
compared to the supervised counterpart. Code is available at
https://github.com/JiayuYANG/Self-supervised-CVP-MVSNet.
Related papers
- Unsupervised Stereo Matching Network For VHR Remote Sensing Images Based On Error Prediction [5.68487023151187]
We propose a novel unsupervised stereo matching network for VHR remote sensing images.
A light-weight module to bridge confidence with predicted error is introduced to refine the core model.
The experimental results on US3D and WHU-Stereo datasets demonstrate that the proposed network achieves superior accuracy compared to other unsupervised networks.
arXiv Detail & Related papers (2024-08-14T09:59:04Z) - SwinDepth: Unsupervised Depth Estimation using Monocular Sequences via
Swin Transformer and Densely Cascaded Network [29.798579906253696]
It is challenging to acquire dense ground truth depth labels for supervised training, and the unsupervised depth estimation using monocular sequences emerges as a promising alternative.
In this paper, we employ a convolution-free Swin Transformer as an image feature extractor so that the network can capture both local geometric features and global semantic features for depth estimation.
Also, we propose a Densely Cascaded Multi-scale Network (DCMNet) that connects every feature map directly with another from different scales via a top-down cascade pathway.
arXiv Detail & Related papers (2023-01-17T06:01:46Z) - Pretraining the Vision Transformer using self-supervised methods for
vision based Deep Reinforcement Learning [0.0]
We study pretraining a Vision Transformer using several state-of-the-art self-supervised methods and assess the quality of the learned representations.
Our results show that all methods are effective in learning useful representations and avoiding representational collapse.
The encoder pretrained with the temporal order verification task shows the best results across all experiments.
arXiv Detail & Related papers (2022-09-22T10:18:59Z) - Unsupervised Monocular Depth Learning with Integrated Intrinsics and
Spatio-Temporal Constraints [61.46323213702369]
This work presents an unsupervised learning framework that is able to predict at-scale depth maps and egomotion.
Our results demonstrate strong performance when compared to the current state-of-the-art on multiple sequences of the KITTI driving dataset.
arXiv Detail & Related papers (2020-11-02T22:26:58Z) - Reversing the cycle: self-supervised deep stereo through enhanced
monocular distillation [51.714092199995044]
In many fields, self-supervised learning solutions are rapidly evolving and filling the gap with supervised approaches.
We propose a novel self-supervised paradigm reversing the link between the two.
In order to train deep stereo networks, we distill knowledge through a monocular completion network.
arXiv Detail & Related papers (2020-08-17T07:40:22Z) - Self-supervised Object Tracking with Cycle-consistent Siamese Networks [55.040249900677225]
We exploit an end-to-end Siamese network in a cycle-consistent self-supervised framework for object tracking.
We propose to integrate a Siamese region proposal and mask regression network in our tracking framework so that a fast and more accurate tracker can be learned without the annotation of each frame.
arXiv Detail & Related papers (2020-08-03T04:10:38Z) - Self-Supervised Viewpoint Learning From Image Collections [116.56304441362994]
We propose a novel learning framework which incorporates an analysis-by-synthesis paradigm to reconstruct images in a viewpoint aware manner.
We show that our approach performs competitively to fully-supervised approaches for several object categories like human faces, cars, buses, and trains.
arXiv Detail & Related papers (2020-04-03T22:01:41Z) - Laplacian Denoising Autoencoder [114.21219514831343]
We propose to learn data representations with a novel type of denoising autoencoder.
The noisy input data is generated by corrupting latent clean data in the gradient domain.
Experiments on several visual benchmarks demonstrate that better representations can be learned with the proposed approach.
arXiv Detail & Related papers (2020-03-30T16:52:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.