DiverseDepth: Affine-invariant Depth Prediction Using Diverse Data
- URL: http://arxiv.org/abs/2002.00569v3
- Date: Sat, 28 Mar 2020 08:26:57 GMT
- Title: DiverseDepth: Affine-invariant Depth Prediction Using Diverse Data
- Authors: Wei Yin, Xinlong Wang, Chunhua Shen, Yifan Liu, Zhi Tian, Songcen Xu,
Changming Sun, Dou Renyin
- Abstract summary: We present a method for depth estimation with monocular images, which can predict high-quality depth on diverse scenes up to an affine transformation.
Experiments show that our method outperforms previous methods on 8 datasets by a large margin with the zero-shot test setting.
- Score: 110.29043712400912
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We present a method for depth estimation with monocular images, which can
predict high-quality depth on diverse scenes up to an affine transformation,
thus preserving accurate shapes of a scene. Previous methods that predict
metric depth often work well only for a specific scene. In contrast, learning
relative depth (information of being closer or further) can enjoy better
generalization, with the price of failing to recover the accurate geometric
shape of the scene. In this work, we propose a dataset and methods to tackle
this dilemma, aiming to predict accurate depth up to an affine transformation
with good generalization to diverse scenes. First we construct a large-scale
and diverse dataset, termed Diverse Scene Depth dataset (DiverseDepth), which
has a broad range of scenes and foreground contents. Compared with previous
learning objectives, i.e., learning metric depth or relative depth, we propose
to learn the affine-invariant depth using our diverse dataset to ensure both
generalization and high-quality geometric shapes of scenes. Furthermore, in
order to train the model on the complex dataset effectively, we propose a
multi-curriculum learning method. Experiments show that our method outperforms
previous methods on 8 datasets by a large margin with the zero-shot test
setting, demonstrating the excellent generalization capacity of the learned
model to diverse scenes. The reconstructed point clouds with the predicted
depth show that our method can recover high-quality 3D shapes. Code and dataset
are available at: https://tinyurl.com/DiverseDepth
Related papers
- Learning to Adapt CLIP for Few-Shot Monocular Depth Estimation [31.34615135846137]
We propose a few-shot-based method which learns to adapt the Vision-Language Models for monocular depth estimation.
Specifically, it assigns different depth bins for different scenes, which can be selected by the model during inference.
With only one image per scene for training, our extensive experiment results on the NYU V2 and KITTI dataset demonstrate that our method outperforms the previous state-of-the-art method by up to 10.6% in terms of MARE.
arXiv Detail & Related papers (2023-11-02T06:56:50Z) - FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - Towards Accurate Reconstruction of 3D Scene Shape from A Single
Monocular Image [91.71077190961688]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes.
We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation.
arXiv Detail & Related papers (2022-08-28T16:20:14Z) - P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior [133.76192155312182]
We propose a method that learns to selectively leverage information from coplanar pixels to improve the predicted depth.
An extensive evaluation of our method shows that we set the new state of the art in supervised monocular depth estimation.
arXiv Detail & Related papers (2022-04-05T10:03:52Z) - 360 Depth Estimation in the Wild -- The Depth360 Dataset and the SegFuse
Network [35.03201732370496]
Single-view depth estimation from omnidirectional images has gained popularity with its wide range of applications such as autonomous driving and scene reconstruction.
In this work, we first establish a large-scale dataset with varied settings called Depth360 to tackle the training data problem.
We then propose an end-to-end two-branch multi-task learning network, SegFuse, that mimics the human eye to effectively learn from the dataset.
arXiv Detail & Related papers (2022-02-16T11:56:31Z) - Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust
Depth Prediction [87.08227378010874]
We show the importance of the high-order 3D geometric constraints for depth prediction.
By designing a loss term that enforces a simple geometric constraint, we significantly improve the accuracy and robustness of monocular depth estimation.
We show state-of-the-art results of learning metric depth on NYU Depth-V2 and KITTI.
arXiv Detail & Related papers (2021-03-07T00:08:21Z) - Learning to Recover 3D Scene Shape from a Single Image [98.20106822614392]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then use 3D point cloud encoders to predict the missing depth shift and focal length that allow us to recover a realistic 3D scene shape.
arXiv Detail & Related papers (2020-12-17T02:35:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.