Survey on Monocular Metric Depth Estimation
- URL: http://arxiv.org/abs/2501.11841v4
- Date: Tue, 26 Aug 2025 08:10:41 GMT
- Title: Survey on Monocular Metric Depth Estimation
- Authors: Jiuling Zhang,
- Abstract summary: Monocular Metric Depth Estimation (MMDE) produces depth maps with absolute scale, ensuring geometric consistency.<n>This survey reviews the evolution of MMDE, from geometry-based methods to state-of-the-art deep models.<n>Methodological advances are analyzed, covering domain generalization, boundary preservation, and the integration of synthetic and real data.
- Score: 2.436681150766912
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular Depth Estimation (MDE) enables spatial understanding, 3D reconstruction, and autonomous navigation, yet deep learning approaches often predict only relative depth without a consistent metric scale. This limitation reduces reliability in applications such as visual SLAM, precise 3D modeling, and view synthesis. Monocular Metric Depth Estimation (MMDE) overcomes this challenge by producing depth maps with absolute scale, ensuring geometric consistency and enabling deployment without additional calibration. This survey reviews the evolution of MMDE, from geometry-based methods to state-of-the-art deep models, with emphasis on the datasets that drive progress. Key benchmarks, including KITTI, NYU-D, ApolloScape, and TartanAir, are examined in terms of modality, scene type, and application domain. Methodological advances are analyzed, covering domain generalization, boundary preservation, and the integration of synthetic and real data. Techniques such as unsupervised and semi-supervised learning, patch-based inference, architectural innovations, and generative modeling are evaluated for their strengths and limitations. By synthesizing current progress, highlighting the importance of high-quality datasets, and identifying open challenges, this survey provides a structured reference for advancing MMDE and supporting its adoption in real-world computer vision systems.
Related papers
- Visual Autoregressive Modelling for Monocular Depth Estimation [69.01449528371916]
We propose a monocular depth estimation method based on visual autoregressive ( VAR) priors.<n>Our method adapts a large-scale text-to-image VAR model and introduces a scale-wise conditional upsampling mechanism.<n>We report state-of-the-art performance in indoor benchmarks under constrained training conditions, and strong performance when applied to outdoor datasets.
arXiv Detail & Related papers (2025-12-27T17:08:03Z) - MoRE: 3D Visual Geometry Reconstruction Meets Mixture-of-Experts [50.37005070020306]
MoRE is a dense 3D visual foundation model based on a Mixture-of-Experts (MoE) architecture.<n>MoRE incorporates a confidence-based depth refinement module that stabilizes and refines geometric estimation.<n>It integrates dense semantic features with globally aligned 3D backbone representations for high-fidelity surface normal prediction.
arXiv Detail & Related papers (2025-10-31T06:54:27Z) - GeoDiff: Geometry-Guided Diffusion for Metric Depth Estimation [25.50613737995557]
We introduce a novel framework for metric depth estimation that enhances pretrained diffusion-based monocular depth estimation (DB-MDE) models with stereo vision guidance.<n>Our training-free solution seamlessly integrates into existing DB-MDE frameworks and generalizes across indoor, outdoor, and complex environments.
arXiv Detail & Related papers (2025-10-21T04:47:36Z) - UM-Depth : Uncertainty Masked Self-Supervised Monocular Depth Estimation with Visual Odometry [3.8323580808203785]
We introduce UM-Depth, a framework that combines motion- and uncertainty-aware refinement to enhance depth accuracy.<n>We develop a teacher training strategy that embeds uncertainty estimation into both the training pipeline and network architecture.<n> UM-Depth achieves state-of-the-art results in both self-supervised depth and pose estimation on the KITTI datasets.
arXiv Detail & Related papers (2025-09-17T05:51:07Z) - Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation [75.30238170051291]
Depth estimation is a fundamental task in 3D computer vision, crucial for applications such as 3D reconstruction, free-viewpoint rendering, robotics, autonomous driving, and AR/VR technologies.<n>Traditional methods relying on hardware sensors like LiDAR are often limited by high costs, low resolution, and environmental sensitivity, limiting their applicability in real-world scenarios.<n>Recent advances in vision-based methods offer a promising alternative, yet they face challenges in generalization and stability due to either the low-capacity model architectures or the reliance on domain-specific and small-scale datasets.
arXiv Detail & Related papers (2025-07-15T17:59:59Z) - An Online Adaptation Method for Robust Depth Estimation and Visual Odometry in the Open World [16.387434563802532]
We develop a visual odometry system that can adapt to diverse novel environments in an online manner.
We construct an objective for self-supervised learning of the depth estimation module based on the output of the visual odometry system.
We demonstrate the robustness and generalization capability of the proposed method in comparison with state-of-the-art learning-based approaches on urban, in-house datasets and a robot platform.
arXiv Detail & Related papers (2025-04-16T01:48:10Z) - Multi-view Reconstruction via SfM-guided Monocular Depth Estimation [92.89227629434316]
We present a new method for multi-view geometric reconstruction.
We incorporate SfM information, a strong multi-view prior, into the depth estimation process.
Our method significantly improves the quality of depth estimation compared to previous monocular depth estimation works.
arXiv Detail & Related papers (2025-03-18T17:54:06Z) - Relative Pose Estimation through Affine Corrections of Monocular Depth Priors [69.59216331861437]
We develop three solvers for relative pose estimation that explicitly account for independent affine (scale and shift) ambiguities.
We propose a hybrid estimation pipeline that combines our proposed solvers with classic point-based solvers and epipolar constraints.
arXiv Detail & Related papers (2025-01-09T18:58:30Z) - TanDepth: Leveraging Global DEMs for Metric Monocular Depth Estimation in UAVs [5.6168844664788855]
This work presents TanDepth, a practical scale recovery method for obtaining metric depth results from relative estimations at inference-time.<n>Our method leverages sparse measurements from Global Digital Elevation Models (GDEM) by projecting them to the camera view.<n>An adaptation to the Cloth Filter Simulation is presented, which allows selecting ground points from the estimated depth map to then correlate with the projected reference points.
arXiv Detail & Related papers (2024-09-08T15:54:43Z) - Self-Supervised Depth Completion Guided by 3D Perception and Geometry
Consistency [17.68427514090938]
This paper explores the utilization of 3D perceptual features and multi-view geometry consistency to devise a high-precision self-supervised depth completion method.
Experiments on benchmark datasets of NYU-Depthv2 and VOID demonstrate that the proposed model achieves the state-of-the-art depth completion performance.
arXiv Detail & Related papers (2023-12-23T14:19:56Z) - Robust Geometry-Preserving Depth Estimation Using Differentiable
Rendering [93.94371335579321]
We propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations.
Comprehensive experiments underscore our framework's superior generalization capabilities.
Our innovative loss functions empower the model to autonomously recover domain-specific scale-and-shift coefficients.
arXiv Detail & Related papers (2023-09-18T12:36:39Z) - Geometric Deep Learning for Structure-Based Drug Design: A Survey [83.87489798671155]
Structure-based drug design (SBDD) leverages the three-dimensional geometry of proteins to identify potential drug candidates.
Recent advancements in geometric deep learning, which effectively integrate and process 3D geometric data, have significantly propelled the field forward.
arXiv Detail & Related papers (2023-06-20T14:21:58Z) - Deep Depth Completion: A Survey [26.09557446012222]
We provide a comprehensive literature review that helps readers better grasp the research trends and clearly understand the current advances.
We investigate the related studies from the design aspects of network architectures, loss functions, benchmark datasets, and learning strategies.
We present a quantitative comparison of model performance on two widely used benchmark datasets, including an indoor and an outdoor dataset.
arXiv Detail & Related papers (2022-05-11T08:24:00Z) - Unsupervised Domain Adaptation for Monocular 3D Object Detection via
Self-Training [57.25828870799331]
We propose STMono3D, a new self-teaching framework for unsupervised domain adaptation on Mono3D.
We develop a teacher-student paradigm to generate adaptive pseudo labels on the target domain.
STMono3D achieves remarkable performance on all evaluated datasets and even surpasses fully supervised results on the KITTI 3D object detection dataset.
arXiv Detail & Related papers (2022-04-25T12:23:07Z) - Recovering 3D Human Mesh from Monocular Images: A Survey [49.00136388529404]
Estimating human pose and shape from monocular images is a long-standing problem in computer vision.
This survey focuses on the task of monocular 3D human mesh recovery.
arXiv Detail & Related papers (2022-03-03T18:56:08Z) - Unsupervised Single-shot Depth Estimation using Perceptual
Reconstruction [0.0]
This study presents the most recent advances in the field of generative neural networks, leveraging them to perform fully unsupervised single-shot depth synthesis.
Two generators for RGB-to-depth and depth-to-RGB transfer are implemented and simultaneously optimized using the Wasserstein-1 distance and a novel perceptual reconstruction term.
The success observed in this study suggests the great potential for unsupervised single-shot depth estimation in real-world applications.
arXiv Detail & Related papers (2022-01-28T15:11:34Z) - Probabilistic and Geometric Depth: Detecting Objects in Perspective [78.00922683083776]
3D object detection is an important capability needed in various practical applications such as driver assistance systems.
Monocular 3D detection, as an economical solution compared to conventional settings relying on binocular vision or LiDAR, has drawn increasing attention recently but still yields unsatisfactory results.
This paper first presents a systematic study on this problem and observes that the current monocular 3D detection problem can be simplified as an instance depth estimation problem.
arXiv Detail & Related papers (2021-07-29T16:30:33Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - Recent Advances in Monocular 2D and 3D Human Pose Estimation: A Deep
Learning Perspective [69.44384540002358]
We provide a comprehensive and holistic 2D-to-3D perspective to tackle this problem.
We categorize the mainstream and milestone approaches since the year 2014 under unified frameworks.
We also summarize the pose representation styles, benchmarks, evaluation metrics, and the quantitative performance of popular approaches.
arXiv Detail & Related papers (2021-04-23T11:07:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.