Weakly-Supervised Monocular Depth Estimationwith Resolution-Mismatched
Data
- URL: http://arxiv.org/abs/2109.11573v1
- Date: Thu, 23 Sep 2021 18:04:12 GMT
- Title: Weakly-Supervised Monocular Depth Estimationwith Resolution-Mismatched
Data
- Authors: Jialei Xu, Yuanchao Bai, Xianming Liu, Junjun Jiang and Xiangyang Ji
- Abstract summary: We propose a novel weakly-supervised framework to train a monocular depth estimation network.
The proposed framework is composed of a sharing weight monocular depth estimation network and a depth reconstruction network for distillation.
Experimental results demonstrate that our method achieves superior performance than unsupervised and semi-supervised learning based schemes.
- Score: 73.9872931307401
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Depth estimation from a single image is an active research topic in computer
vision. The most accurate approaches are based on fully supervised learning
models, which rely on a large amount of dense and high-resolution (HR)
ground-truth depth maps. However, in practice, color images are usually
captured with much higher resolution than depth maps, leading to the
resolution-mismatched effect. In this paper, we propose a novel
weakly-supervised framework to train a monocular depth estimation network to
generate HR depth maps with resolution-mismatched supervision, i.e., the inputs
are HR color images and the ground-truth are low-resolution (LR) depth maps.
The proposed weakly supervised framework is composed of a sharing weight
monocular depth estimation network and a depth reconstruction network for
distillation. Specifically, for the monocular depth estimation network the
input color image is first downsampled to obtain its LR version with the same
resolution as the ground-truth depth. Then, both HR and LR color images are fed
into the proposed monocular depth estimation network to obtain the
corresponding estimated depth maps. We introduce three losses to train the
network: 1) reconstruction loss between the estimated LR depth and the
ground-truth LR depth; 2) reconstruction loss between the downsampled estimated
HR depth and the ground-truth LR depth; 3) consistency loss between the
estimated LR depth and the downsampled estimated HR depth. In addition, we
design a depth reconstruction network from depth to depth. Through distillation
loss, features between two networks maintain the structural consistency in
affinity space, and finally improving the estimation network performance.
Experimental results demonstrate that our method achieves superior performance
than unsupervised and semi-supervised learning based schemes, and is
competitive or even better compared to supervised ones.
Related papers
- Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging
Scenarios [103.72094710263656]
This paper presents a novel approach that identifies and integrates dominant cross-modality depth features with a learning-based framework.
We propose a novel confidence loss steering a confidence predictor network to yield a confidence map specifying latent potential depth areas.
With the resulting confidence map, we propose a multi-modal fusion network that fuses the final depth in an end-to-end manner.
arXiv Detail & Related papers (2024-02-19T04:39:16Z) - Depth-Relative Self Attention for Monocular Depth Estimation [23.174459018407003]
deep neural networks rely on various visual hints such as size, shade, and texture extracted from RGB information.
We propose a novel depth estimation model named RElative Depth Transformer (RED-T) that uses relative depth as guidance in self-attention.
We show that the proposed model achieves competitive results in monocular depth estimation benchmarks and is less biased to RGB information.
arXiv Detail & Related papers (2023-04-25T14:20:31Z) - Boosting Monocular 3D Object Detection with Object-Centric Auxiliary
Depth Supervision [13.593246617391266]
We propose a method to boost the RGB image-based 3D detector by jointly training the detection network with a depth prediction loss analogous to the depth estimation task.
Our novel object-centric depth prediction loss focuses on depth around foreground objects, which is important for 3D object detection.
Our depth regression model is further trained to predict the uncertainty of depth to represent the 3D confidence of objects.
arXiv Detail & Related papers (2022-10-29T11:32:28Z) - VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction [71.83308989022635]
In this paper, we advocate that replicating the traditional two stages framework with deep neural networks improves both the interpretability and the accuracy of the results.
Our network operates in two steps: 1) the local computation of the local depth maps with a deep MVS technique, and, 2) the depth maps and images' features fusion to build a single TSDF volume.
In order to improve the matching performance between images acquired from very different viewpoints, we introduce a rotation-invariant 3D convolution kernel called PosedConv.
arXiv Detail & Related papers (2021-08-19T11:33:58Z) - Boosting Monocular Depth Estimation Models to High-Resolution via
Content-Adaptive Multi-Resolution Merging [14.279471205248534]
We show how a consistent scene structure and high-frequency details affect depth estimation performance.
We present a double estimation method that improves the whole-image depth estimation and a patch selection method that adds local details.
We demonstrate that by merging estimations at different resolutions with changing context, we can generate multi-megapixel depth maps with a high level of detail.
arXiv Detail & Related papers (2021-05-28T17:55:15Z) - Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark
Dataset and Baseline [48.69396457721544]
We build a large-scale dataset named "RGB-D-D" to promote the study of depth map super-resolution (SR)
We provide a fast depth map super-resolution (FDSR) baseline, in which the high-frequency component adaptively decomposed from RGB image to guide the depth map SR.
For the real-world LR depth maps, our algorithm can produce more accurate HR depth maps with clearer boundaries and to some extent correct the depth value errors.
arXiv Detail & Related papers (2021-04-13T13:27:26Z) - Deep Two-View Structure-from-Motion Revisited [83.93809929963969]
Two-view structure-from-motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM.
We propose to revisit the problem of deep two-view SfM by leveraging the well-posedness of the classic pipeline.
Our method consists of 1) an optical flow estimation network that predicts dense correspondences between two frames; 2) a normalized pose estimation module that computes relative camera poses from the 2D optical flow correspondences, and 3) a scale-invariant depth estimation network that leverages epipolar geometry to reduce the search space, refine the dense correspondences, and estimate relative depth maps.
arXiv Detail & Related papers (2021-04-01T15:31:20Z) - Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for
3D Reconstruction [12.728154351588053]
We present an efficient multi-view stereo (MVS) network for 3D reconstruction from multiview images.
We introduce a coarseto-fine depth inference strategy to achieve high resolution depth.
arXiv Detail & Related papers (2020-11-25T13:34:11Z) - Depth Completion Using a View-constrained Deep Prior [73.21559000917554]
Recent work has shown that the structure of convolutional neural networks (CNNs) induces a strong prior that favors natural images.
This prior, known as a deep image prior (DIP), is an effective regularizer in inverse problems such as image denoising and inpainting.
We extend the concept of the DIP to depth images. Given color images and noisy and incomplete target depth maps, we reconstruct a depth map restored by virtue of using the CNN network structure as a prior.
arXiv Detail & Related papers (2020-01-21T21:56:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.