GeoDiff: Geometry-Guided Diffusion for Metric Depth Estimation
- URL: http://arxiv.org/abs/2510.18291v1
- Date: Tue, 21 Oct 2025 04:47:36 GMT
- Title: GeoDiff: Geometry-Guided Diffusion for Metric Depth Estimation
- Authors: Tuan Pham, Thanh-Tung Le, Xiaohui Xie, Stephan Mandt,
- Abstract summary: We introduce a novel framework for metric depth estimation that enhances pretrained diffusion-based monocular depth estimation (DB-MDE) models with stereo vision guidance.<n>Our training-free solution seamlessly integrates into existing DB-MDE frameworks and generalizes across indoor, outdoor, and complex environments.
- Score: 25.50613737995557
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce a novel framework for metric depth estimation that enhances pretrained diffusion-based monocular depth estimation (DB-MDE) models with stereo vision guidance. While existing DB-MDE methods excel at predicting relative depth, estimating absolute metric depth remains challenging due to scale ambiguities in single-image scenarios. To address this, we reframe depth estimation as an inverse problem, leveraging pretrained latent diffusion models (LDMs) conditioned on RGB images, combined with stereo-based geometric constraints, to learn scale and shift for accurate depth recovery. Our training-free solution seamlessly integrates into existing DB-MDE frameworks and generalizes across indoor, outdoor, and complex environments. Extensive experiments demonstrate that our approach matches or surpasses state-of-the-art methods, particularly in challenging scenarios involving translucent and specular surfaces, all without requiring retraining.
Related papers
- MDE-VIO: Enhancing Visual-Inertial Odometry Using Learned Depth Priors [8.2208199207543]
We propose a novel framework that enforces affine-invariant depth consistency and pairwise ordinal constraints.<n>This approach strictly adheres to the computational limits of edge devices while robustly recovering metric scale.
arXiv Detail & Related papers (2026-02-11T19:53:06Z) - StarryGazer: Leveraging Monocular Depth Estimation Models for Domain-Agnostic Single Depth Image Completion [56.28564075246147]
StarryGazer is a framework that predicts dense depth images from a single sparse depth image and an RGB image.<n>We employ a pre-trained MDE model to produce relative depth images.<n>A refinement network is trained with the synthetic pairs, incorporating the relative depth maps and RGB images to improve the model's accuracy and robustness.
arXiv Detail & Related papers (2025-12-15T09:56:09Z) - Dense-depth map guided deep Lidar-Visual Odometry with Sparse Point Clouds and Images [4.320220844287486]
Odometry is a critical task for autonomous systems for self-localization and navigation.<n>We propose a novel LiDAR-Visual odometry framework that integrates LiDAR point clouds and images for accurate pose estimation.<n>Our approach achieves similar or superior accuracy and robustness compared to state-of-the-art visual and LiDAR odometry methods.
arXiv Detail & Related papers (2025-07-21T10:58:10Z) - Intrinsic Image Decomposition for Robust Self-supervised Monocular Depth Estimation on Reflective Surfaces [10.557788087220509]
Self-supervised monocular depth estimation (SSMDE) has gained attention in the field of deep learning.<n>We propose a novel framework that incorporates intrinsic image decomposition into SSMDE.<n>Our method synergistically trains for both monocular depth estimation and intrinsic image decomposition.
arXiv Detail & Related papers (2025-03-28T07:56:59Z) - Multi-view Reconstruction via SfM-guided Monocular Depth Estimation [92.89227629434316]
We present a new method for multi-view geometric reconstruction.<n>We incorporate SfM information, a strong multi-view prior, into the depth estimation process.<n>Our method significantly improves the quality of depth estimation compared to previous monocular depth estimation works.
arXiv Detail & Related papers (2025-03-18T17:54:06Z) - Relative Pose Estimation through Affine Corrections of Monocular Depth Priors [69.59216331861437]
We develop three solvers for relative pose estimation that explicitly account for independent affine (scale and shift) ambiguities.<n>We propose a hybrid estimation pipeline that combines our proposed solvers with classic point-based solvers and epipolar constraints.
arXiv Detail & Related papers (2025-01-09T18:58:30Z) - Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion [57.08169927189237]
Existing methods for depth completion operate in tightly constrained settings.<n>Inspired by advances in monocular depth estimation, we reframe depth completion as an image-conditional depth map generation.<n>Marigold-DC builds on a pretrained latent diffusion model for monocular depth estimation and injects the depth observations as test-time guidance.
arXiv Detail & Related papers (2024-12-18T00:06:41Z) - MetricGold: Leveraging Text-To-Image Latent Diffusion Models for Metric Depth Estimation [9.639797094021988]
MetricGold is a novel approach that harnesses generative diffusion model's rich priors to improve metric depth estimation.<n>Our experiments demonstrate robust generalization across diverse datasets, producing sharper and higher quality metric depth estimates.
arXiv Detail & Related papers (2024-11-16T20:59:01Z) - TanDepth: Leveraging Global DEMs for Metric Monocular Depth Estimation in UAVs [5.6168844664788855]
This work presents TanDepth, a practical scale recovery method for obtaining metric depth results from relative estimations at inference-time.<n>Our method leverages sparse measurements from Global Digital Elevation Models (GDEM) by projecting them to the camera view.<n>An adaptation to the Cloth Filter Simulation is presented, which allows selecting ground points from the estimated depth map to then correlate with the projected reference points.
arXiv Detail & Related papers (2024-09-08T15:54:43Z) - ScaleDepth: Decomposing Metric Depth Estimation into Scale Prediction and Relative Depth Estimation [62.600382533322325]
We propose a novel monocular depth estimation method called ScaleDepth.
Our method decomposes metric depth into scene scale and relative depth, and predicts them through a semantic-aware scale prediction module.
Our method achieves metric depth estimation for both indoor and outdoor scenes in a unified framework.
arXiv Detail & Related papers (2024-07-11T05:11:56Z) - Robust Geometry-Preserving Depth Estimation Using Differentiable
Rendering [93.94371335579321]
We propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations.
Comprehensive experiments underscore our framework's superior generalization capabilities.
Our innovative loss functions empower the model to autonomously recover domain-specific scale-and-shift coefficients.
arXiv Detail & Related papers (2023-09-18T12:36:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.