Learning Occlusion-Aware Coarse-to-Fine Depth Map for Self-supervised
Monocular Depth Estimation
- URL: http://arxiv.org/abs/2203.10925v1
- Date: Mon, 21 Mar 2022 12:43:42 GMT
- Title: Learning Occlusion-Aware Coarse-to-Fine Depth Map for Self-supervised
Monocular Depth Estimation
- Authors: Zhengming Zhou and Qiulei Dong
- Abstract summary: We propose a novel network to learn an Occlusion-aware Coarse-to-Fine Depth map for self-supervised monocular depth estimation.
The proposed OCFD-Net does not only employ a discrete depth constraint for learning a coarse-level depth map, but also employ a continuous depth constraint for learning a scene depth residual.
- Score: 11.929584800629673
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised monocular depth estimation, aiming to learn scene depths from
single images in a self-supervised manner, has received much attention
recently. In spite of recent efforts in this field, how to learn accurate scene
depths and alleviate the negative influence of occlusions for self-supervised
depth estimation, still remains an open problem. Addressing this problem, we
firstly empirically analyze the effects of both the continuous and discrete
depth constraints which are widely used in the training process of many
existing works. Then inspired by the above empirical analysis, we propose a
novel network to learn an Occlusion-aware Coarse-to-Fine Depth map for
self-supervised monocular depth estimation, called OCFD-Net. Given an arbitrary
training set of stereo image pairs, the proposed OCFD-Net does not only employ
a discrete depth constraint for learning a coarse-level depth map, but also
employ a continuous depth constraint for learning a scene depth residual,
resulting in a fine-level depth map. In addition, an occlusion-aware module is
designed under the proposed OCFD-Net, which is able to improve the capability
of the learnt fine-level depth map for handling occlusions. Extensive
experimental results on the public KITTI and Make3D datasets demonstrate that
the proposed method outperforms 20 existing state-of-the-art methods in most
cases.
Related papers
- SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for
Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes.
It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions.
We introduce an external pretrained monocular depth estimation model for generating single-image depth prior.
Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z) - Densely Constrained Depth Estimator for Monocular 3D Object Detection [48.12271792836015]
Estimating accurate 3D locations of objects from monocular images is a challenging problem because of lacking depth.
We propose a method that utilizes dense projection constraints from edges of any direction.
The proposed method achieves state-of-the-art performance on the KITTI and WOD benchmarks.
arXiv Detail & Related papers (2022-07-20T17:24:22Z) - Learning Depth via Leveraging Semantics: Self-supervised Monocular Depth
Estimation with Both Implicit and Explicit Semantic Guidance [34.62415122883441]
We propose a Semantic-aware Spatial Feature Alignment scheme to align implicit semantic features with depth features for scene-aware depth estimation.
We also propose a semantic-guided ranking loss to explicitly constrain the estimated depth maps to be consistent with real scene contextual properties.
Our method produces high quality depth maps which are consistently superior either on complex scenes or diverse semantic categories.
arXiv Detail & Related papers (2021-02-11T14:29:51Z) - Adaptive confidence thresholding for monocular depth estimation [83.06265443599521]
We propose a new approach to leverage pseudo ground truth depth maps of stereo images generated from self-supervised stereo matching methods.
The confidence map of the pseudo ground truth depth map is estimated to mitigate performance degeneration by inaccurate pseudo depth maps.
Experimental results demonstrate superior performance to state-of-the-art monocular depth estimation methods.
arXiv Detail & Related papers (2020-09-27T13:26:16Z) - Self-Attention Dense Depth Estimation Network for Unrectified Video
Sequences [6.821598757786515]
LiDAR and radar sensors are the hardware solution for real-time depth estimation.
Deep learning based self-supervised depth estimation methods have shown promising results.
We propose a self-attention based depth and ego-motion network for unrectified images.
arXiv Detail & Related papers (2020-05-28T21:53:53Z) - Guiding Monocular Depth Estimation Using Depth-Attention Volume [38.92495189498365]
We propose guiding depth estimation to favor planar structures that are ubiquitous especially in indoor environments.
Experiments on two popular indoor datasets, NYU-Depth-v2 and ScanNet, show that our method achieves state-of-the-art depth estimation results.
arXiv Detail & Related papers (2020-04-06T15:45:52Z) - Occlusion-Aware Depth Estimation with Adaptive Normal Constraints [85.44842683936471]
We present a new learning-based method for multi-frame depth estimation from a color video.
Our method outperforms the state-of-the-art in terms of depth estimation accuracy.
arXiv Detail & Related papers (2020-04-02T07:10:45Z) - Monocular Depth Estimation Based On Deep Learning: An Overview [16.2543991384566]
Inferring depth information from a single image (monocular depth estimation) is an ill-posed problem.
Deep learning has been widely studied recently and achieved promising performance in accuracy.
In order to improve the accuracy of depth estimation, different kinds of network frameworks, loss functions and training strategies are proposed.
arXiv Detail & Related papers (2020-03-14T12:35:34Z) - Single Image Depth Estimation Trained via Depth from Defocus Cues [105.67073923825842]
Estimating depth from a single RGB image is a fundamental task in computer vision.
In this work, we rely, instead of different views, on depth from focus cues.
We present results that are on par with supervised methods on KITTI and Make3D datasets and outperform unsupervised learning approaches.
arXiv Detail & Related papers (2020-01-14T20:22:54Z) - Don't Forget The Past: Recurrent Depth Estimation from Monocular Video [92.84498980104424]
We put three different types of depth estimation into a common framework.
Our method produces a time series of depth maps.
It can be applied to monocular videos only or be combined with different types of sparse depth patterns.
arXiv Detail & Related papers (2020-01-08T16:50:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.