Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
- URL: http://arxiv.org/abs/2410.02073v1
- Date: Wed, 2 Oct 2024 22:42:20 GMT
- Title: Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
- Authors: Aleksei Bochkovskii, Amaƫl Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan R. Richter, Vladlen Koltun,
- Abstract summary: We present a foundation model for zero-shot metric monocular depth estimation.
Our model, Depth Pro, synthesizes high-resolution depth maps with unparalleled sharpness and high-frequency details.
It produces a 2.25-megapixel depth map in 0.3 seconds on a standard GPU.
- Score: 45.6690958201871
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a foundation model for zero-shot metric monocular depth estimation. Our model, Depth Pro, synthesizes high-resolution depth maps with unparalleled sharpness and high-frequency details. The predictions are metric, with absolute scale, without relying on the availability of metadata such as camera intrinsics. And the model is fast, producing a 2.25-megapixel depth map in 0.3 seconds on a standard GPU. These characteristics are enabled by a number of technical contributions, including an efficient multi-scale vision transformer for dense prediction, a training protocol that combines real and synthetic datasets to achieve high metric accuracy alongside fine boundary tracing, dedicated evaluation metrics for boundary accuracy in estimated depth maps, and state-of-the-art focal length estimation from a single image. Extensive experiments analyze specific design choices and demonstrate that Depth Pro outperforms prior work along multiple dimensions. We release code and weights at https://github.com/apple/ml-depth-pro
Related papers
- Self-supervised Monocular Depth Estimation with Large Kernel Attention [30.44895226042849]
We propose a self-supervised monocular depth estimation network to get finer details.
Specifically, we propose a decoder based on large kernel attention, which can model long-distance dependencies.
Our method achieves competitive results on the KITTI dataset.
arXiv Detail & Related papers (2024-09-26T14:44:41Z) - ScaleDepth: Decomposing Metric Depth Estimation into Scale Prediction and Relative Depth Estimation [62.600382533322325]
We propose a novel monocular depth estimation method called ScaleDepth.
Our method decomposes metric depth into scene scale and relative depth, and predicts them through a semantic-aware scale prediction module.
Our method achieves metric depth estimation for both indoor and outdoor scenes in a unified framework.
arXiv Detail & Related papers (2024-07-11T05:11:56Z) - Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation [74.28509379811084]
Metric3D v2 is a geometric foundation model for zero-shot metric depth and surface normal estimation from a single image.
We propose solutions for both metric depth estimation and surface normal estimation.
Our method enables the accurate recovery of metric 3D structures on randomly collected internet images.
arXiv Detail & Related papers (2024-03-22T02:30:46Z) - Metrically Scaled Monocular Depth Estimation through Sparse Priors for
Underwater Robots [0.0]
We formulate a deep learning model that fuses sparse depth measurements from triangulated features to improve the depth predictions.
The network is trained in a supervised fashion on the forward-looking underwater dataset, FLSea.
The method achieves real-time performance, running at 160 FPS on a laptop GPU and 7 FPS on a single CPU core.
arXiv Detail & Related papers (2023-10-25T16:32:31Z) - Deep Neighbor Layer Aggregation for Lightweight Self-Supervised
Monocular Depth Estimation [1.6775954077761863]
We present a fully convolutional depth estimation network using contextual feature fusion.
Compared to UNet++ and HRNet, we use high-resolution and low-resolution features to reserve information on small targets and fast-moving objects.
Our method reduces the parameters without sacrificing accuracy.
arXiv Detail & Related papers (2023-09-17T13:40:15Z) - FS-Depth: Focal-and-Scale Depth Estimation from a Single Image in Unseen
Indoor Scene [57.26600120397529]
It has long been an ill-posed problem to predict absolute depth maps from single images in real (unseen) indoor scenes.
We develop a focal-and-scale depth estimation model to well learn absolute depth maps from single images in unseen indoor scenes.
arXiv Detail & Related papers (2023-07-27T04:49:36Z) - Single Image Depth Prediction Made Better: A Multivariate Gaussian Take [163.14849753700682]
We introduce an approach that performs continuous modeling of per-pixel depth.
Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.
arXiv Detail & Related papers (2023-03-31T16:01:03Z) - Towards Accurate Reconstruction of 3D Scene Shape from A Single
Monocular Image [91.71077190961688]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes.
We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation.
arXiv Detail & Related papers (2022-08-28T16:20:14Z) - Boosting Monocular Depth Estimation Models to High-Resolution via
Content-Adaptive Multi-Resolution Merging [14.279471205248534]
We show how a consistent scene structure and high-frequency details affect depth estimation performance.
We present a double estimation method that improves the whole-image depth estimation and a patch selection method that adds local details.
We demonstrate that by merging estimations at different resolutions with changing context, we can generate multi-megapixel depth maps with a high level of detail.
arXiv Detail & Related papers (2021-05-28T17:55:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.