BenchDepth: Are We on the Right Way to Evaluate Depth Foundation Models?
- URL: http://arxiv.org/abs/2507.15321v1
- Date: Mon, 21 Jul 2025 07:23:14 GMT
- Title: BenchDepth: Are We on the Right Way to Evaluate Depth Foundation Models?
- Authors: Zhenyu Li, Haotong Lin, Jiashi Feng, Peter Wonka, Bingyi Kang,
- Abstract summary: Deep learning has led to powerful depth foundation models (DFMs)<n>Traditional benchmarks rely on alignment-based metrics that introduce biases, favor certain depth representations, and complicate fair comparisons.<n>We propose BenchDepth, a new benchmark that evaluates DFMs through five carefully selected downstream proxy tasks.
- Score: 87.83483720539071
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Depth estimation is a fundamental task in computer vision with diverse applications. Recent advancements in deep learning have led to powerful depth foundation models (DFMs), yet their evaluation remains challenging due to inconsistencies in existing protocols. Traditional benchmarks rely on alignment-based metrics that introduce biases, favor certain depth representations, and complicate fair comparisons. In this work, we propose BenchDepth, a new benchmark that evaluates DFMs through five carefully selected downstream proxy tasks: depth completion, stereo matching, monocular feed-forward 3D scene reconstruction, SLAM, and vision-language spatial understanding. Unlike conventional evaluation protocols, our approach assesses DFMs based on their practical utility in real-world applications, bypassing problematic alignment procedures. We benchmark eight state-of-the-art DFMs and provide an in-depth analysis of key findings and observations. We hope our work sparks further discussion in the community on best practices for depth model evaluation and paves the way for future research and advancements in depth estimation.
Related papers
- Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation [75.30238170051291]
Depth estimation is a fundamental task in 3D computer vision, crucial for applications such as 3D reconstruction, free-viewpoint rendering, robotics, autonomous driving, and AR/VR technologies.<n>Traditional methods relying on hardware sensors like LiDAR are often limited by high costs, low resolution, and environmental sensitivity, limiting their applicability in real-world scenarios.<n>Recent advances in vision-based methods offer a promising alternative, yet they face challenges in generalization and stability due to either the low-capacity model architectures or the reliance on domain-specific and small-scale datasets.
arXiv Detail & Related papers (2025-07-15T17:59:59Z) - Evaluating Robustness of Monocular Depth Estimation with Procedural Scene Perturbations [55.4735586739093]
We introduce PDE, a new benchmark which enables systematic robustness evaluation.<n>PDE uses procedural generation to create 3D scenes that test robustness to various controlled perturbations.<n>Our analysis yields interesting findings on what perturbations are challenging for state-of-the-art depth models.
arXiv Detail & Related papers (2025-07-01T17:33:48Z) - Multi-view Reconstruction via SfM-guided Monocular Depth Estimation [92.89227629434316]
We present a new method for multi-view geometric reconstruction.<n>We incorporate SfM information, a strong multi-view prior, into the depth estimation process.<n>Our method significantly improves the quality of depth estimation compared to previous monocular depth estimation works.
arXiv Detail & Related papers (2025-03-18T17:54:06Z) - Survey on Monocular Metric Depth Estimation [0.9790236766474202]
Deep learning methods typically estimate relative depth from a single image, but the lack of metric scale often leads to geometric inconsistencies.<n>Monocular Metric Depth Estimation (MMDE) addresses this issue by producing depth maps with absolute scale.<n>This paper presents a structured survey of depth estimation methods, tracing the evolution from traditional geometry-based approaches to modern deep learning models.
arXiv Detail & Related papers (2025-01-21T02:51:10Z) - Relative Pose Estimation through Affine Corrections of Monocular Depth Priors [69.59216331861437]
We develop three solvers for relative pose estimation that explicitly account for independent affine (scale and shift) ambiguities.<n>We propose a hybrid estimation pipeline that combines our proposed solvers with classic point-based solvers and epipolar constraints.
arXiv Detail & Related papers (2025-01-09T18:58:30Z) - From 2D to 3D: Re-thinking Benchmarking of Monocular Depth Prediction [80.67873933010783]
We argue that MDP is currently witnessing benchmark over-fitting and relying on metrics that are only partially helpful to gauge the usefulness of the predictions for 3D applications.
This limits the design and development of novel methods that are truly aware of - and improving towards estimating - the 3D structure of the scene rather than optimizing 2D-based distances.
We propose a set of metrics well suited to evaluate the 3D geometry of MDP approaches and a novel indoor benchmark, RIO-D3D, crucial for the proposed evaluation methodology.
arXiv Detail & Related papers (2022-03-15T17:50:54Z) - Unsupervised Single-shot Depth Estimation using Perceptual
Reconstruction [0.0]
This study presents the most recent advances in the field of generative neural networks, leveraging them to perform fully unsupervised single-shot depth synthesis.
Two generators for RGB-to-depth and depth-to-RGB transfer are implemented and simultaneously optimized using the Wasserstein-1 distance and a novel perceptual reconstruction term.
The success observed in this study suggests the great potential for unsupervised single-shot depth estimation in real-world applications.
arXiv Detail & Related papers (2022-01-28T15:11:34Z) - Self-Supervised Monocular Depth Estimation with Internal Feature Fusion [12.874712571149725]
Self-supervised learning for depth estimation uses geometry in image sequences for supervision.
We propose a novel depth estimation networkDIFFNet, which can make use of semantic information in down and upsampling procedures.
arXiv Detail & Related papers (2021-10-18T17:31:11Z) - Monocular Depth Estimation Based On Deep Learning: An Overview [16.2543991384566]
Inferring depth information from a single image (monocular depth estimation) is an ill-posed problem.
Deep learning has been widely studied recently and achieved promising performance in accuracy.
In order to improve the accuracy of depth estimation, different kinds of network frameworks, loss functions and training strategies are proposed.
arXiv Detail & Related papers (2020-03-14T12:35:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.