Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust
Depth Prediction
- URL: http://arxiv.org/abs/2103.04216v2
- Date: Tue, 9 Mar 2021 12:34:46 GMT
- Title: Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust
Depth Prediction
- Authors: Wei Yin and Yifan Liu and Chunhua Shen
- Abstract summary: We show the importance of the high-order 3D geometric constraints for depth prediction.
By designing a loss term that enforces a simple geometric constraint, we significantly improve the accuracy and robustness of monocular depth estimation.
We show state-of-the-art results of learning metric depth on NYU Depth-V2 and KITTI.
- Score: 87.08227378010874
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Monocular depth prediction plays a crucial role in understanding 3D scene
geometry. Although recent methods have achieved impressive progress in terms of
evaluation metrics such as the pixel-wise relative error, most methods neglect
the geometric constraints in the 3D space. In this work, we show the importance
of the high-order 3D geometric constraints for depth prediction. By designing a
loss term that enforces a simple geometric constraint, namely, virtual normal
directions determined by randomly sampled three points in the reconstructed 3D
space, we significantly improve the accuracy and robustness of monocular depth
estimation. Significantly, the virtual normal loss can not only improve the
performance of learning metric depth, but also disentangle the scale
information and enrich the model with better shape information. Therefore, when
not having access to absolute metric depth training data, we can use virtual
normal to learn a robust affine-invariant depth generated on diverse scenes. In
experiments, We show state-of-the-art results of learning metric depth on NYU
Depth-V2 and KITTI. From the high-quality predicted depth, we are now able to
recover good 3D structures of the scene such as the point cloud and surface
normal directly, eliminating the necessity of relying on additional models as
was previously done. To demonstrate the excellent generalizability of learning
affine-invariant depth on diverse data with the virtual normal loss, we
construct a large-scale and diverse dataset for training affine-invariant
depth, termed Diverse Scene Depth dataset (DiverseDepth), and test on five
datasets with the zero-shot test setting. Code is available at:
https://git.io/Depth
Related papers
- Towards Accurate Reconstruction of 3D Scene Shape from A Single
Monocular Image [91.71077190961688]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes.
We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation.
arXiv Detail & Related papers (2022-08-28T16:20:14Z) - MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D
Object Detection [10.377424252002792]
monocular 3D object detection lacks accurate depth recovery ability.
Deep neural network (DNN) enables monocular depth-sensing from high-level learned features.
We propose a joint semantic and geometric cost volume to model the depth error.
arXiv Detail & Related papers (2022-03-16T11:54:10Z) - Learning to Recover 3D Scene Shape from a Single Image [98.20106822614392]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then use 3D point cloud encoders to predict the missing depth shift and focal length that allow us to recover a realistic 3D scene shape.
arXiv Detail & Related papers (2020-12-17T02:35:13Z) - GeoNet++: Iterative Geometric Neural Network with Edge-Aware Refinement
for Joint Depth and Surface Normal Estimation [204.13451624763735]
We propose a geometric neural network with edge-aware refinement (GeoNet++) to jointly predict both depth and surface normal maps from a single image.
GeoNet++ effectively predicts depth and surface normals with strong 3D consistency and sharp boundaries.
In contrast to current metrics that focus on evaluating pixel-wise error/accuracy, 3DGM measures whether the predicted depth can reconstruct high-quality 3D surface normals.
arXiv Detail & Related papers (2020-12-13T06:48:01Z) - Occlusion-Aware Depth Estimation with Adaptive Normal Constraints [85.44842683936471]
We present a new learning-based method for multi-frame depth estimation from a color video.
Our method outperforms the state-of-the-art in terms of depth estimation accuracy.
arXiv Detail & Related papers (2020-04-02T07:10:45Z) - DiverseDepth: Affine-invariant Depth Prediction Using Diverse Data [110.29043712400912]
We present a method for depth estimation with monocular images, which can predict high-quality depth on diverse scenes up to an affine transformation.
Experiments show that our method outperforms previous methods on 8 datasets by a large margin with the zero-shot test setting.
arXiv Detail & Related papers (2020-02-03T05:38:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.