Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
- URL: http://arxiv.org/abs/2410.02073v1
- Date: Wed, 2 Oct 2024 22:42:20 GMT
- Title: Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
- Authors: Aleksei Bochkovskii, Amaƫl Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan R. Richter, Vladlen Koltun,
- Abstract summary: We present a foundation model for zero-shot metric monocular depth estimation.
Our model, Depth Pro, synthesizes high-resolution depth maps with unparalleled sharpness and high-frequency details.
It produces a 2.25-megapixel depth map in 0.3 seconds on a standard GPU.
- Score: 45.6690958201871
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a foundation model for zero-shot metric monocular depth estimation. Our model, Depth Pro, synthesizes high-resolution depth maps with unparalleled sharpness and high-frequency details. The predictions are metric, with absolute scale, without relying on the availability of metadata such as camera intrinsics. And the model is fast, producing a 2.25-megapixel depth map in 0.3 seconds on a standard GPU. These characteristics are enabled by a number of technical contributions, including an efficient multi-scale vision transformer for dense prediction, a training protocol that combines real and synthetic datasets to achieve high metric accuracy alongside fine boundary tracing, dedicated evaluation metrics for boundary accuracy in estimated depth maps, and state-of-the-art focal length estimation from a single image. Extensive experiments analyze specific design choices and demonstrate that Depth Pro outperforms prior work along multiple dimensions. We release code and weights at https://github.com/apple/ml-depth-pro
Related papers
- Fixing the Scale and Shift in Monocular Depth For Camera Pose Estimation [47.68705641608316]
We propose a novel framework for estimating the relative pose between two cameras from point correspondences with associated monocular depths.
We derive efficient solvers for three cases: (1) two calibrated cameras, (2) two uncalibrated cameras with an unknown but shared focal length, and (3) two uncalibrated cameras with unknown and different focal lengths.
Compared to prior work, our solvers achieve state-of-the-art results on two large-scale, real-world datasets.
arXiv Detail & Related papers (2025-01-13T23:13:33Z) - Foundation Models Meet Low-Cost Sensors: Test-Time Adaptation for Rescaling Disparity for Zero-Shot Metric Depth Estimation [46.037640130193566]
We propose a new method to rescale Depth Anything predictions using 3D points provided by low-cost sensors or techniques such as low-resolution LiDAR.
Our experiments highlight improvements relative to other metric depth estimation methods and competitive results compared to fine-tuned approaches.
arXiv Detail & Related papers (2024-12-18T17:50:15Z) - Single-Shot Metric Depth from Focused Plenoptic Cameras [18.412662939667676]
Metric depth estimation from visual sensors is crucial for robots to perceive, navigate, and interact with their environment.
Light field imaging provides a promising solution for estimating metric depth by using a unique lens configuration through a single device.
Our work explores the potential of focused plenoptic cameras for dense metric depth.
arXiv Detail & Related papers (2024-12-03T11:21:17Z) - SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation [17.88798247412388]
SharpDepth combines the metric accuracy of discriminative depth estimation methods with the fine-grained boundary sharpness typically achieved by generative methods.
Our approach bridges these limitations by integrating metric accuracy with detailed boundary preservation, resulting in depth predictions that are both metrically precise and visually sharp.
arXiv Detail & Related papers (2024-11-27T11:07:27Z) - ScaleDepth: Decomposing Metric Depth Estimation into Scale Prediction and Relative Depth Estimation [62.600382533322325]
We propose a novel monocular depth estimation method called ScaleDepth.
Our method decomposes metric depth into scene scale and relative depth, and predicts them through a semantic-aware scale prediction module.
Our method achieves metric depth estimation for both indoor and outdoor scenes in a unified framework.
arXiv Detail & Related papers (2024-07-11T05:11:56Z) - Deep Neighbor Layer Aggregation for Lightweight Self-Supervised
Monocular Depth Estimation [1.6775954077761863]
We present a fully convolutional depth estimation network using contextual feature fusion.
Compared to UNet++ and HRNet, we use high-resolution and low-resolution features to reserve information on small targets and fast-moving objects.
Our method reduces the parameters without sacrificing accuracy.
arXiv Detail & Related papers (2023-09-17T13:40:15Z) - FS-Depth: Focal-and-Scale Depth Estimation from a Single Image in Unseen
Indoor Scene [57.26600120397529]
It has long been an ill-posed problem to predict absolute depth maps from single images in real (unseen) indoor scenes.
We develop a focal-and-scale depth estimation model to well learn absolute depth maps from single images in unseen indoor scenes.
arXiv Detail & Related papers (2023-07-27T04:49:36Z) - Single Image Depth Prediction Made Better: A Multivariate Gaussian Take [163.14849753700682]
We introduce an approach that performs continuous modeling of per-pixel depth.
Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.
arXiv Detail & Related papers (2023-03-31T16:01:03Z) - Towards Accurate Reconstruction of 3D Scene Shape from A Single
Monocular Image [91.71077190961688]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes.
We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation.
arXiv Detail & Related papers (2022-08-28T16:20:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.