DynOcc: Learning Single-View Depth from Dynamic Occlusion Cues
- URL: http://arxiv.org/abs/2103.16706v1
- Date: Tue, 30 Mar 2021 22:17:36 GMT
- Title: DynOcc: Learning Single-View Depth from Dynamic Occlusion Cues
- Authors: Yifan Wang, Linjie Luo, Xiaohui Shen, Xing Mei
- Abstract summary: We introduce the first depth dataset DynOcc consisting of dynamic in-the-wild scenes.
Our approach leverages the cues in these dynamic scenes to infer depth relationships between points of selected video frames.
In total our DynOcc dataset contains 22M depth pairs out of 91K frames from a diverse set of videos.
- Score: 37.837552043766166
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, significant progress has been made in single-view depth estimation
thanks to increasingly large and diverse depth datasets. However, these
datasets are largely limited to specific application domains (e.g. indoor,
autonomous driving) or static in-the-wild scenes due to hardware constraints or
technical limitations of 3D reconstruction. In this paper, we introduce the
first depth dataset DynOcc consisting of dynamic in-the-wild scenes. Our
approach leverages the occlusion cues in these dynamic scenes to infer depth
relationships between points of selected video frames. To achieve accurate
occlusion detection and depth order estimation, we employ a novel occlusion
boundary detection, filtering and thinning scheme followed by a robust
foreground/background classification method. In total our DynOcc dataset
contains 22M depth pairs out of 91K frames from a diverse set of videos. Using
our dataset we achieved state-of-the-art results measured in weighted human
disagreement rate (WHDR). We also show that the inferred depth maps trained
with DynOcc can preserve sharper depth boundaries.
Related papers
- Depth-Guided Semi-Supervised Instance Segmentation [62.80063539262021]
Semi-Supervised Instance (SSIS) aims to leverage an amount of unlabeled data during training.
Previous frameworks primarily utilized the RGB information of unlabeled images to generate pseudo-labels.
We introduce a Depth-Guided (DG) framework to overcome this limitation.
arXiv Detail & Related papers (2024-06-25T09:36:50Z) - Mind The Edge: Refining Depth Edges in Sparsely-Supervised Monocular Depth Estimation [42.19770683222846]
Monocular Depth Estimation (MDE) is a fundamental problem in computer vision with numerous applications.
In this paper we propose to learn to detect the location of depth edges from densely-supervised synthetic data.
We demonstrate significant gains in the accuracy of the depth edges with comparable per-pixel depth accuracy on several challenging datasets.
arXiv Detail & Related papers (2022-12-10T14:49:24Z) - SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for
Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes.
It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions.
We introduce an external pretrained monocular depth estimation model for generating single-image depth prior.
Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z) - Learning Occlusion-Aware Coarse-to-Fine Depth Map for Self-supervised
Monocular Depth Estimation [11.929584800629673]
We propose a novel network to learn an Occlusion-aware Coarse-to-Fine Depth map for self-supervised monocular depth estimation.
The proposed OCFD-Net does not only employ a discrete depth constraint for learning a coarse-level depth map, but also employ a continuous depth constraint for learning a scene depth residual.
arXiv Detail & Related papers (2022-03-21T12:43:42Z) - DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes [68.38952377590499]
We present a novel approach for estimating depth from a monocular camera as it moves through complex indoor environments.
Our approach predicts absolute scale depth maps over the entire scene consisting of a static background and multiple moving people.
arXiv Detail & Related papers (2021-08-12T09:12:39Z) - Boundary-induced and scene-aggregated network for monocular depth
prediction [20.358133522462513]
We propose the Boundary-induced and Scene-aggregated network (BS-Net) to predict the dense depth of a single RGB image.
Several experimental results on the NYUD v2 dataset and xffthe iBims-1 dataset illustrate the state-of-the-art performance of the proposed approach.
arXiv Detail & Related papers (2021-02-26T01:43:17Z) - Guiding Monocular Depth Estimation Using Depth-Attention Volume [38.92495189498365]
We propose guiding depth estimation to favor planar structures that are ubiquitous especially in indoor environments.
Experiments on two popular indoor datasets, NYU-Depth-v2 and ScanNet, show that our method achieves state-of-the-art depth estimation results.
arXiv Detail & Related papers (2020-04-06T15:45:52Z) - Occlusion-Aware Depth Estimation with Adaptive Normal Constraints [85.44842683936471]
We present a new learning-based method for multi-frame depth estimation from a color video.
Our method outperforms the state-of-the-art in terms of depth estimation accuracy.
arXiv Detail & Related papers (2020-04-02T07:10:45Z) - Single Image Depth Estimation Trained via Depth from Defocus Cues [105.67073923825842]
Estimating depth from a single RGB image is a fundamental task in computer vision.
In this work, we rely, instead of different views, on depth from focus cues.
We present results that are on par with supervised methods on KITTI and Make3D datasets and outperform unsupervised learning approaches.
arXiv Detail & Related papers (2020-01-14T20:22:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.