UniDepth: Universal Monocular Metric Depth Estimation
- URL: http://arxiv.org/abs/2403.18913v1
- Date: Wed, 27 Mar 2024 18:06:31 GMT
- Title: UniDepth: Universal Monocular Metric Depth Estimation
- Authors: Luigi Piccinelli, Yung-Hsu Yang, Christos Sakaridis, Mattia Segu, Siyuan Li, Luc Van Gool, Fisher Yu,
- Abstract summary: We propose a new model, UniDepth, capable of reconstructing metric 3D scenes from solely single images across domains.
Our model exploits a pseudo-spherical output representation, which disentangles camera and depth representations.
Thorough evaluations on ten datasets in a zero-shot regime consistently demonstrate the superior performance of UniDepth.
- Score: 81.80512457953903
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Accurate monocular metric depth estimation (MMDE) is crucial to solving downstream tasks in 3D perception and modeling. However, the remarkable accuracy of recent MMDE methods is confined to their training domains. These methods fail to generalize to unseen domains even in the presence of moderate domain gaps, which hinders their practical applicability. We propose a new model, UniDepth, capable of reconstructing metric 3D scenes from solely single images across domains. Departing from the existing MMDE methods, UniDepth directly predicts metric 3D points from the input image at inference time without any additional information, striving for a universal and flexible MMDE solution. In particular, UniDepth implements a self-promptable camera module predicting dense camera representation to condition depth features. Our model exploits a pseudo-spherical output representation, which disentangles camera and depth representations. In addition, we propose a geometric invariance loss that promotes the invariance of camera-prompted depth features. Thorough evaluations on ten datasets in a zero-shot regime consistently demonstrate the superior performance of UniDepth, even when compared with methods directly trained on the testing domains. Code and models are available at: https://github.com/lpiccinelli-eth/unidepth
Related papers
- Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation [74.28509379811084]
Metric3D v2 is a geometric foundation model for zero-shot metric depth and surface normal estimation from a single image.
We propose solutions for both metric depth estimation and surface normal estimation.
Our method enables the accurate recovery of metric 3D structures on randomly collected internet images.
arXiv Detail & Related papers (2024-03-22T02:30:46Z) - SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple
Cameras and Scenes by One Model [23.95095404136943]
This paper proposes SM4Depth, a seamless MMDE method to address all the issues above within a single network.
First, we reveal that a consistent field of view (FOV) is the key to resolve metric ambiguity'' across cameras.
Second, to achieve consistently high accuracy across scenes, we explicitly model the metric scale determination as discretizing the depth interval into bins.
Third, to reduce the reliance on massive training data, we propose a divide and conquer" solution.
arXiv Detail & Related papers (2024-03-13T14:08:25Z) - Depth-discriminative Metric Learning for Monocular 3D Object Detection [14.554132525651868]
We introduce a novel metric learning scheme that encourages the model to extract depth-discriminative features regardless of the visual attributes.
Our method consistently improves the performance of various baselines by 23.51% and 5.78% on average.
arXiv Detail & Related papers (2024-01-02T07:34:09Z) - Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image [85.91935485902708]
We show that the key to a zero-shot single-view metric depth model lies in the combination of large-scale data training and resolving the metric ambiguity from various camera models.
We propose a canonical camera space transformation module, which explicitly addresses the ambiguity problems and can be effortlessly plugged into existing monocular models.
Our method enables the accurate recovery of metric 3D structures on randomly collected internet images.
arXiv Detail & Related papers (2023-07-20T16:14:23Z) - Single Image Depth Prediction Made Better: A Multivariate Gaussian Take [163.14849753700682]
We introduce an approach that performs continuous modeling of per-pixel depth.
Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.
arXiv Detail & Related papers (2023-03-31T16:01:03Z) - DONet: Learning Category-Level 6D Object Pose and Size Estimation from
Depth Observation [53.55300278592281]
We propose a method of Category-level 6D Object Pose and Size Estimation (COPSE) from a single depth image.
Our framework makes inferences based on the rich geometric information of the object in the depth channel alone.
Our framework competes with state-of-the-art approaches that require labeled real-world images.
arXiv Detail & Related papers (2021-06-27T10:41:50Z) - Manifold Topology Divergence: a Framework for Comparing Data Manifolds [109.0784952256104]
We develop a framework for comparing data manifold, aimed at the evaluation of deep generative models.
Based on the Cross-Barcode, we introduce the Manifold Topology Divergence score (MTop-Divergence)
We demonstrate that the MTop-Divergence accurately detects various degrees of mode-dropping, intra-mode collapse, mode invention, and image disturbance.
arXiv Detail & Related papers (2021-06-08T00:30:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.