Guiding Monocular Depth Estimation Using Depth-Attention Volume
- URL: http://arxiv.org/abs/2004.02760v2
- Date: Sun, 16 Aug 2020 16:22:27 GMT
- Title: Guiding Monocular Depth Estimation Using Depth-Attention Volume
- Authors: Lam Huynh, Phong Nguyen-Ha, Jiri Matas, Esa Rahtu, Janne Heikkila
- Abstract summary: We propose guiding depth estimation to favor planar structures that are ubiquitous especially in indoor environments.
Experiments on two popular indoor datasets, NYU-Depth-v2 and ScanNet, show that our method achieves state-of-the-art depth estimation results.
- Score: 38.92495189498365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recovering the scene depth from a single image is an ill-posed problem that
requires additional priors, often referred to as monocular depth cues, to
disambiguate different 3D interpretations. In recent works, those priors have
been learned in an end-to-end manner from large datasets by using deep neural
networks. In this paper, we propose guiding depth estimation to favor planar
structures that are ubiquitous especially in indoor environments. This is
achieved by incorporating a non-local coplanarity constraint to the network
with a novel attention mechanism called depth-attention volume (DAV).
Experiments on two popular indoor datasets, namely NYU-Depth-v2 and ScanNet,
show that our method achieves state-of-the-art depth estimation results while
using only a fraction of the number of parameters needed by the competing
methods.
Related papers
- MonoCD: Monocular 3D Object Detection with Complementary Depths [9.186673054867866]
Depth estimation is an essential but challenging subtask of monocular 3D object detection.
We propose to increase the complementarity of depths with two novel designs.
Experiments on the KITTI benchmark demonstrate that our method achieves state-of-the-art performance without introducing extra data.
arXiv Detail & Related papers (2024-04-04T03:30:49Z) - NDDepth: Normal-Distance Assisted Monocular Depth Estimation [22.37113584192617]
We propose a novel physics (geometry)-driven deep learning framework for monocular depth estimation.
We introduce a new normal-distance head that outputs pixel-level surface normal and plane-to-origin distance for deriving depth at each position.
We develop an effective contrastive iterative refinement module that refines depth in a complementary manner according to the depth uncertainty.
arXiv Detail & Related papers (2023-09-19T13:05:57Z) - GraphCSPN: Geometry-Aware Depth Completion via Dynamic GCNs [49.55919802779889]
We propose a Graph Convolution based Spatial Propagation Network (GraphCSPN) as a general approach for depth completion.
In this work, we leverage convolution neural networks as well as graph neural networks in a complementary way for geometric representation learning.
Our method achieves the state-of-the-art performance, especially when compared in the case of using only a few propagation steps.
arXiv Detail & Related papers (2022-10-19T17:56:03Z) - Learning Occlusion-Aware Coarse-to-Fine Depth Map for Self-supervised
Monocular Depth Estimation [11.929584800629673]
We propose a novel network to learn an Occlusion-aware Coarse-to-Fine Depth map for self-supervised monocular depth estimation.
The proposed OCFD-Net does not only employ a discrete depth constraint for learning a coarse-level depth map, but also employ a continuous depth constraint for learning a scene depth residual.
arXiv Detail & Related papers (2022-03-21T12:43:42Z) - 3DVNet: Multi-View Depth Prediction and Volumetric Refinement [68.68537312256144]
3DVNet is a novel multi-view stereo (MVS) depth-prediction method.
Our key idea is the use of a 3D scene-modeling network that iteratively updates a set of coarse depth predictions.
We show that our method exceeds state-of-the-art accuracy in both depth prediction and 3D reconstruction metrics.
arXiv Detail & Related papers (2021-12-01T00:52:42Z) - Monocular Depth Estimation Primed by Salient Point Detection and
Normalized Hessian Loss [43.950140695759764]
We propose an accurate and lightweight framework for monocular depth estimation based on a self-attention mechanism stemming from salient point detection.
We introduce a normalized Hessian loss term invariant to scaling and shear along the depth direction, which is shown to substantially improve the accuracy.
The proposed method achieves state-of-the-art results on NYU-Depth-v2 and KITTI while using 3.1-38.4 times smaller model in terms of the number of parameters than baseline approaches.
arXiv Detail & Related papers (2021-08-25T07:51:09Z) - VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction [71.83308989022635]
In this paper, we advocate that replicating the traditional two stages framework with deep neural networks improves both the interpretability and the accuracy of the results.
Our network operates in two steps: 1) the local computation of the local depth maps with a deep MVS technique, and, 2) the depth maps and images' features fusion to build a single TSDF volume.
In order to improve the matching performance between images acquired from very different viewpoints, we introduce a rotation-invariant 3D convolution kernel called PosedConv.
arXiv Detail & Related papers (2021-08-19T11:33:58Z) - Sparse Auxiliary Networks for Unified Monocular Depth Prediction and
Completion [56.85837052421469]
Estimating scene geometry from data obtained with cost-effective sensors is key for robots and self-driving cars.
In this paper, we study the problem of predicting dense depth from a single RGB image with optional sparse measurements from low-cost active depth sensors.
We introduce Sparse Networks (SANs), a new module enabling monodepth networks to perform both the tasks of depth prediction and completion.
arXiv Detail & Related papers (2021-03-30T21:22:26Z) - Deep Multi-view Depth Estimation with Predicted Uncertainty [11.012201499666503]
We employ a dense-optical-flow network to compute correspondences and then triangulate the point cloud to obtain an initial depth map.
To further increase the triangulation accuracy, we introduce a depth-refinement network (DRN) that optimize the initial depth map based on the image's contextual cues.
arXiv Detail & Related papers (2020-11-19T00:22:09Z) - Occlusion-Aware Depth Estimation with Adaptive Normal Constraints [85.44842683936471]
We present a new learning-based method for multi-frame depth estimation from a color video.
Our method outperforms the state-of-the-art in terms of depth estimation accuracy.
arXiv Detail & Related papers (2020-04-02T07:10:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.