Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation
- URL: http://arxiv.org/abs/2412.14015v2
- Date: Tue, 22 Apr 2025 14:42:39 GMT
- Title: Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation
- Authors: Haotong Lin, Sida Peng, Jingxiao Chen, Songyou Peng, Jiaming Sun, Minghuan Liu, Hujun Bao, Jiashi Feng, Xiaowei Zhou, Bingyi Kang,
- Abstract summary: We introduce prompting into depth foundation models, creating a new paradigm for metric depth estimation termed Prompt Depth Anything.<n>We use a low-cost LiDAR as the prompt to guide the Depth Anything model for accurate metric depth output, achieving up to 4K resolution.
- Score: 108.04354143020886
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prompts play a critical role in unleashing the power of language and vision foundation models for specific tasks. For the first time, we introduce prompting into depth foundation models, creating a new paradigm for metric depth estimation termed Prompt Depth Anything. Specifically, we use a low-cost LiDAR as the prompt to guide the Depth Anything model for accurate metric depth output, achieving up to 4K resolution. Our approach centers on a concise prompt fusion design that integrates the LiDAR at multiple scales within the depth decoder. To address training challenges posed by limited datasets containing both LiDAR depth and precise GT depth, we propose a scalable data pipeline that includes synthetic data LiDAR simulation and real data pseudo GT depth generation. Our approach sets new state-of-the-arts on the ARKitScenes and ScanNet++ datasets and benefits downstream applications, including 3D reconstruction and generalized robotic grasping.
Related papers
- Distilling Monocular Foundation Model for Fine-grained Depth Completion [17.603217168518356]
We propose a two-stage knowledge distillation framework to provide dense supervision for depth completion.
In the first stage, we generate diverse training data from natural images, which distills geometric knowledge to depth completion.
In the second stage, we employ a scale- and shift-invariant loss to learn real-world scales when fine-tuning on real-world datasets.
arXiv Detail & Related papers (2025-03-21T09:34:01Z) - DepthSplat: Connecting Gaussian Splatting and Depth [90.06180236292866]
We present DepthSplat to connect Gaussian splatting and depth estimation.
We show that Gaussian splatting can serve as an unsupervised pre-training objective for learning powerful depth models.
Our DepthSplat achieves state-of-the-art performance on ScanNet, RealEstate10K and DL3DV datasets.
arXiv Detail & Related papers (2024-10-17T17:59:58Z) - Robust Geometry-Preserving Depth Estimation Using Differentiable
Rendering [93.94371335579321]
We propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations.
Comprehensive experiments underscore our framework's superior generalization capabilities.
Our innovative loss functions empower the model to autonomously recover domain-specific scale-and-shift coefficients.
arXiv Detail & Related papers (2023-09-18T12:36:39Z) - LiDAR Meta Depth Completion [47.99004789132264]
We propose a meta depth completion network that uses data patterns to learn a task network to solve a given depth completion task effectively.
While using a single model, our method yields significantly better results than a non-adaptive baseline trained on different LiDAR patterns.
These advantages allow flexible deployment of a single depth completion model on different sensors.
arXiv Detail & Related papers (2023-07-24T13:05:36Z) - Mind The Edge: Refining Depth Edges in Sparsely-Supervised Monocular Depth Estimation [42.19770683222846]
Monocular Depth Estimation (MDE) is a fundamental problem in computer vision with numerous applications.
In this paper we propose to learn to detect the location of depth edges from densely-supervised synthetic data.
We demonstrate significant gains in the accuracy of the depth edges with comparable per-pixel depth accuracy on several challenging datasets.
arXiv Detail & Related papers (2022-12-10T14:49:24Z) - Joint Learning of Salient Object Detection, Depth Estimation and Contour
Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD)
Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks.
Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z) - Consistent Depth Prediction under Various Illuminations using Dilated
Cross Attention [1.332560004325655]
We propose to use internet 3D indoor scenes and manually tune their illuminations to render photo-realistic RGB photos and their corresponding depth and BRDF maps.
We perform cross attention on these dilated features to retain the consistency of depth prediction under different illuminations.
Our method is evaluated by comparing it with current state-of-the-art methods on Vari dataset and a significant improvement is observed in experiments.
arXiv Detail & Related papers (2021-12-15T10:02:46Z) - DenseLiDAR: A Real-Time Pseudo Dense Depth Guided Depth Completion
Network [3.1447111126464997]
We propose DenseLiDAR, a novel real-time pseudo-depth guided depth completion neural network.
We exploit dense pseudo-depth map obtained from simple morphological operations to guide the network.
Our model is able to achieve the state-of-the-art performance at the highest frame rate of 50Hz.
arXiv Detail & Related papers (2021-08-28T14:18:29Z) - Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust
Depth Prediction [87.08227378010874]
We show the importance of the high-order 3D geometric constraints for depth prediction.
By designing a loss term that enforces a simple geometric constraint, we significantly improve the accuracy and robustness of monocular depth estimation.
We show state-of-the-art results of learning metric depth on NYU Depth-V2 and KITTI.
arXiv Detail & Related papers (2021-03-07T00:08:21Z) - ADAADepth: Adapting Data Augmentation and Attention for Self-Supervised
Monocular Depth Estimation [8.827921242078881]
We propose ADAA, utilising depth augmentation as depth supervision for learning accurate and robust depth.
We propose a relational self-attention module that learns rich contextual features and further enhances depth results.
We evaluate our predicted depth on the KITTI driving dataset and achieve state-of-the-art results.
arXiv Detail & Related papers (2021-03-01T09:06:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.