Related papers: Marginalized Bundle Adjustment: Multi-View Camera Pose from Monocular Depth Estimates

Marginalized Bundle Adjustment: Multi-View Camera Pose from Monocular Depth Estimates

URL: http://arxiv.org/abs/2602.18906v1
Date: Sat, 21 Feb 2026 17:01:32 GMT
Title: Marginalized Bundle Adjustment: Multi-View Camera Pose from Monocular Depth Estimates
Authors: Shengjie Zhu, Ahmed Abdelkader, Mark J. Matthews, Xiaoming Liu, Wen-Sheng Chu,
Abstract summary: We show that MDE depth maps are sufficiently accurate to yield SoTA or competitive results in SfM and camera relocalization tasks.<n>Our method highlights the significant potential of MDE in multi-view 3D vision.
Score: 19.574697033192436
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Structure-from-Motion (SfM) is a fundamental 3D vision task for recovering camera parameters and scene geometry from multi-view images. While recent deep learning advances enable accurate Monocular Depth Estimation (MDE) from single images without depending on camera motion, integrating MDE into SfM remains a challenge. Unlike conventional triangulated sparse point clouds, MDE produces dense depth maps with significantly higher error variance. Inspired by modern RANSAC estimators, we propose Marginalized Bundle Adjustment (MBA) to mitigate MDE error variance leveraging its density. With MBA, we show that MDE depth maps are sufficiently accurate to yield SoTA or competitive results in SfM and camera relocalization tasks. Through extensive evaluations, we demonstrate consistently robust performance across varying scales, ranging from few-frame setups to large multi-view systems with thousands of images. Our method highlights the significant potential of MDE in multi-view 3D vision.

Related papers

StarryGazer: Leveraging Monocular Depth Estimation Models for Domain-Agnostic Single Depth Image Completion [56.28564075246147]
StarryGazer is a framework that predicts dense depth images from a single sparse depth image and an RGB image.<n>We employ a pre-trained MDE model to produce relative depth images.<n>A refinement network is trained with the synthetic pairs, incorporating the relative depth maps and RGB images to improve the model's accuracy and robustness.
arXiv Detail & Related papers (2025-12-15T09:56:09Z)
No Pose Estimation? No Problem: Pose-Agnostic and Instance-Aware Test-Time Adaptation for Monocular Depth Estimation [7.436063412302697]
Test-time (domain) adaptation (TTA) is one of the compelling and practical approaches to address the issue.<n>We propose a novel and high-performing TTA framework for MDE, named PITTA.<n>Our approach incorporates two key innovative strategies: (i) pose-agnostic TTA paradigm for MDE and (ii) instance-aware image masking.
arXiv Detail & Related papers (2025-11-07T07:55:02Z)
UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler [62.06785782635153]
We propose a new model, UniDepthV2, capable of reconstructing metric 3D scenes from solely single images across domains.<n>UniDepthV2 directly predicts metric 3D points from the input image at inference time without any additional information.<n>Our model exploits a pseudo-spherical output representation, which disentangles the camera and depth representations.
arXiv Detail & Related papers (2025-02-27T14:03:15Z)
MultiDepth: Multi-Sample Priors for Refining Monocular Metric Depth Estimations in Indoor Scenes [0.0]
Existing models are sensitive to factors such as boundary frequency of objects in the scene and scene complexity. We propose a solution by taking samples of the image along with the initial depth map prediction made by a pre-trained MMDE model. Compared to existing iterative depth refinement techniques, MultiDepth does not employ normal map prediction as part of its architecture.
arXiv Detail & Related papers (2024-11-01T21:30:51Z)
BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation [25.047835960649167]
BetterDepth is a conditional diffusion-based refiner that takes the prediction from pre-trained MDE models as depth conditioning. BetterDepth achieves state-of-the-art zero-shot MDE performance on diverse public datasets and on in-the-wild scenes.
arXiv Detail & Related papers (2024-07-25T11:16:37Z)
UniDepth: Universal Monocular Metric Depth Estimation [81.80512457953903]
We propose a new model, UniDepth, capable of reconstructing metric 3D scenes from solely single images across domains. Our model exploits a pseudo-spherical output representation, which disentangles camera and depth representations. Thorough evaluations on ten datasets in a zero-shot regime consistently demonstrate the superior performance of UniDepth.
arXiv Detail & Related papers (2024-03-27T18:06:31Z)
SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One Model [72.0795843450604]
Current approaches face challenges in maintaining consistent accuracy across diverse scenes. These methods rely on extensive datasets comprising millions, if not tens of millions, of data for training. This paper presents SM$4$Depth, a model that seamlessly works for both indoor and outdoor scenes.
arXiv Detail & Related papers (2024-03-13T14:08:25Z)
Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image [85.91935485902708]
We show that the key to a zero-shot single-view metric depth model lies in the combination of large-scale data training and resolving the metric ambiguity from various camera models. We propose a canonical camera space transformation module, which explicitly addresses the ambiguity problems and can be effortlessly plugged into existing monocular models. Our method enables the accurate recovery of metric 3D structures on randomly collected internet images.
arXiv Detail & Related papers (2023-07-20T16:14:23Z)
Video Depth Estimation by Fusing Flow-to-Depth Proposals [65.24533384679657]
We present an approach with a differentiable flow-to-depth layer for video depth estimation. The model consists of a flow-to-depth layer, a camera pose refinement module, and a depth fusion network. Our approach outperforms state-of-the-art depth estimation methods, and has reasonable cross dataset generalization capability.
arXiv Detail & Related papers (2019-12-30T10:45:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.