Survey on Monocular Metric Depth Estimation
- URL: http://arxiv.org/abs/2501.11841v1
- Date: Tue, 21 Jan 2025 02:51:10 GMT
- Title: Survey on Monocular Metric Depth Estimation
- Authors: Jiuling Zhang,
- Abstract summary: This paper provides a review of depth estimation technologies, highlighting the evolution from geometry-based methods to state-of-the-art deep learning approaches.<n>Recent progress in zero-shot MMDE research is explored, focusing on challenges such as model generalization and the loss of detail at scene boundaries.
- Score: 0.9790236766474202
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular Depth Estimation (MDE) is a fundamental computer vision task underpinning applications such as spatial understanding, 3D reconstruction, and autonomous driving. While deep learning-based MDE methods can predict relative depth from a single image, their lack of metric scale information often results in scale inconsistencies, limiting their utility in downstream tasks like visual SLAM, 3D reconstruction, and novel view synthesis. Monocular Metric Depth Estimation (MMDE) addresses these challenges by enabling precise, scene-scale depth inference. MMDE improves depth consistency, enhances sequential task stability, simplifies integration into downstream applications, and broadens practical use cases. This paper provides a comprehensive review of depth estimation technologies, highlighting the evolution from geometry-based methods to state-of-the-art deep learning approaches. It emphasizes advancements in scale-agnostic methods, which are crucial for enabling zero-shot generalization as the foundational capability for MMDE. Recent progress in zero-shot MMDE research is explored, focusing on challenges such as model generalization and the loss of detail at scene boundaries. Innovative strategies to address these issues include unlabelled data augmentation, image patching, architectural optimization, and generative techniques. These advancements, analyzed in detail, demonstrate significant contributions to overcoming existing limitations. Finally, this paper synthesizes recent developments in zero-shot MMDE, identifies unresolved challenges, and outlines future research directions. By offering a clear roadmap and cutting-edge insights, this work aims to deepen understanding of MMDE, inspire novel applications, and drive technological innovation.
Related papers
- Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation [75.30238170051291]
Depth estimation is a fundamental task in 3D computer vision, crucial for applications such as 3D reconstruction, free-viewpoint rendering, robotics, autonomous driving, and AR/VR technologies.<n>Traditional methods relying on hardware sensors like LiDAR are often limited by high costs, low resolution, and environmental sensitivity, limiting their applicability in real-world scenarios.<n>Recent advances in vision-based methods offer a promising alternative, yet they face challenges in generalization and stability due to either the low-capacity model architectures or the reliance on domain-specific and small-scale datasets.
arXiv Detail & Related papers (2025-07-15T17:59:59Z) - An Online Adaptation Method for Robust Depth Estimation and Visual Odometry in the Open World [16.387434563802532]
We develop a visual odometry system that can adapt to diverse novel environments in an online manner.
We construct an objective for self-supervised learning of the depth estimation module based on the output of the visual odometry system.
We demonstrate the robustness and generalization capability of the proposed method in comparison with state-of-the-art learning-based approaches on urban, in-house datasets and a robot platform.
arXiv Detail & Related papers (2025-04-16T01:48:10Z) - Multi-view Reconstruction via SfM-guided Monocular Depth Estimation [92.89227629434316]
We present a new method for multi-view geometric reconstruction.
We incorporate SfM information, a strong multi-view prior, into the depth estimation process.
Our method significantly improves the quality of depth estimation compared to previous monocular depth estimation works.
arXiv Detail & Related papers (2025-03-18T17:54:06Z) - Relative Pose Estimation through Affine Corrections of Monocular Depth Priors [69.59216331861437]
We develop three solvers for relative pose estimation that explicitly account for independent affine (scale and shift) ambiguities.
We propose a hybrid estimation pipeline that combines our proposed solvers with classic point-based solvers and epipolar constraints.
arXiv Detail & Related papers (2025-01-09T18:58:30Z) - Self-Supervised Depth Completion Guided by 3D Perception and Geometry
Consistency [17.68427514090938]
This paper explores the utilization of 3D perceptual features and multi-view geometry consistency to devise a high-precision self-supervised depth completion method.
Experiments on benchmark datasets of NYU-Depthv2 and VOID demonstrate that the proposed model achieves the state-of-the-art depth completion performance.
arXiv Detail & Related papers (2023-12-23T14:19:56Z) - Robust Geometry-Preserving Depth Estimation Using Differentiable
Rendering [93.94371335579321]
We propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations.
Comprehensive experiments underscore our framework's superior generalization capabilities.
Our innovative loss functions empower the model to autonomously recover domain-specific scale-and-shift coefficients.
arXiv Detail & Related papers (2023-09-18T12:36:39Z) - Deep Depth Completion: A Survey [26.09557446012222]
We provide a comprehensive literature review that helps readers better grasp the research trends and clearly understand the current advances.
We investigate the related studies from the design aspects of network architectures, loss functions, benchmark datasets, and learning strategies.
We present a quantitative comparison of model performance on two widely used benchmark datasets, including an indoor and an outdoor dataset.
arXiv Detail & Related papers (2022-05-11T08:24:00Z) - Recovering 3D Human Mesh from Monocular Images: A Survey [49.00136388529404]
Estimating human pose and shape from monocular images is a long-standing problem in computer vision.
This survey focuses on the task of monocular 3D human mesh recovery.
arXiv Detail & Related papers (2022-03-03T18:56:08Z) - Unsupervised Single-shot Depth Estimation using Perceptual
Reconstruction [0.0]
This study presents the most recent advances in the field of generative neural networks, leveraging them to perform fully unsupervised single-shot depth synthesis.
Two generators for RGB-to-depth and depth-to-RGB transfer are implemented and simultaneously optimized using the Wasserstein-1 distance and a novel perceptual reconstruction term.
The success observed in this study suggests the great potential for unsupervised single-shot depth estimation in real-world applications.
arXiv Detail & Related papers (2022-01-28T15:11:34Z) - Probabilistic and Geometric Depth: Detecting Objects in Perspective [78.00922683083776]
3D object detection is an important capability needed in various practical applications such as driver assistance systems.
Monocular 3D detection, as an economical solution compared to conventional settings relying on binocular vision or LiDAR, has drawn increasing attention recently but still yields unsatisfactory results.
This paper first presents a systematic study on this problem and observes that the current monocular 3D detection problem can be simplified as an instance depth estimation problem.
arXiv Detail & Related papers (2021-07-29T16:30:33Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - Recent Advances in Monocular 2D and 3D Human Pose Estimation: A Deep
Learning Perspective [69.44384540002358]
We provide a comprehensive and holistic 2D-to-3D perspective to tackle this problem.
We categorize the mainstream and milestone approaches since the year 2014 under unified frameworks.
We also summarize the pose representation styles, benchmarks, evaluation metrics, and the quantitative performance of popular approaches.
arXiv Detail & Related papers (2021-04-23T11:07:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.