Related papers: Fixing the Scale and Shift in Monocular Depth For Camera Pose Estimation

Fixing the Scale and Shift in Monocular Depth For Camera Pose Estimation

URL: http://arxiv.org/abs/2501.07742v1
Date: Mon, 13 Jan 2025 23:13:33 GMT
Title: Fixing the Scale and Shift in Monocular Depth For Camera Pose Estimation
Authors: Yaqing Ding, Václav Vávra, Viktor Kocur, Jian Yang, Torsten Sattler, Zuzana Kukelova,
Abstract summary: We propose a novel framework for estimating the relative pose between two cameras from point correspondences with associated monocular depths. We derive efficient solvers for three cases: (1) two calibrated cameras, (2) two uncalibrated cameras with an unknown but shared focal length, and (3) two uncalibrated cameras with unknown and different focal lengths. Compared to prior work, our solvers achieve state-of-the-art results on two large-scale, real-world datasets.
Score: 47.68705641608316
License:
Abstract: Recent advances in monocular depth prediction have led to significantly improved depth prediction accuracy. In turn, this enables various applications to use such depth predictions. In this paper, we propose a novel framework for estimating the relative pose between two cameras from point correspondences with associated monocular depths. Since depth predictions are typically defined up to an unknown scale and shift parameter, our solvers jointly estimate both scale and shift parameters together with the camera pose. We derive efficient solvers for three cases: (1) two calibrated cameras, (2) two uncalibrated cameras with an unknown but shared focal length, and (3) two uncalibrated cameras with unknown and different focal lengths. Experiments on synthetic and real data, including experiments with depth maps estimated by 11 different depth predictors, show the practical viability of our solvers. Compared to prior work, our solvers achieve state-of-the-art results on two large-scale, real-world datasets. The source code is available at https://github.com/yaqding/pose_monodepth

Related papers

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second [45.6690958201871]
We present a foundation model for zero-shot metric monocular depth estimation. Our model, Depth Pro, synthesizes high-resolution depth maps with unparalleled sharpness and high-frequency details. It produces a 2.25-megapixel depth map in 0.3 seconds on a standard GPU.
arXiv Detail & Related papers (2024-10-02T22:42:20Z)
FS-Depth: Focal-and-Scale Depth Estimation from a Single Image in Unseen Indoor Scene [57.26600120397529]
It has long been an ill-posed problem to predict absolute depth maps from single images in real (unseen) indoor scenes. We develop a focal-and-scale depth estimation model to well learn absolute depth maps from single images in unseen indoor scenes.
arXiv Detail & Related papers (2023-07-27T04:49:36Z)
DepthP+P: Metric Accurate Monocular Depth Estimation using Planar and Parallax [0.0]
Current self-supervised monocular depth estimation methods are mostly based on estimating a rigid-body motion representing camera motion. We propose DepthP+P, a method that learns to estimate outputs in metric scale by following the traditional planar parallax paradigm.
arXiv Detail & Related papers (2023-01-05T14:53:21Z)
Multi-Camera Collaborative Depth Prediction via Consistent Structure Estimation [75.99435808648784]
We propose a novel multi-camera collaborative depth prediction method. It does not require large overlapping areas while maintaining structure consistency between cameras. Experimental results on DDAD and NuScenes datasets demonstrate the superior performance of our method.
arXiv Detail & Related papers (2022-10-05T03:44:34Z)
Towards Accurate Reconstruction of 3D Scene Shape from A Single Monocular Image [91.71077190961688]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image. We then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes. We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation.
arXiv Detail & Related papers (2022-08-28T16:20:14Z)
Monocular 3D Object Detection with Depth from Motion [74.29588921594853]
We take advantage of camera ego-motion for accurate object depth estimation and detection. Our framework, named Depth from Motion (DfM), then uses the established geometry to lift 2D image features to the 3D space and detects 3D objects thereon. Our framework outperforms state-of-the-art methods by a large margin on the KITTI benchmark.
arXiv Detail & Related papers (2022-07-26T15:48:46Z)
On the role of depth predictions for 3D human pose estimation [0.04199844472131921]
We build a system that takes 2d joint locations as input along with their estimated depth value and predicts their 3d positions in camera coordinates. Results are produced on neural network that accepts a low dimensional input and be integrated into a real-time system. Our system can be combined with an off-the-shelf 2d pose detector and a depth map predictor to perform 3d pose estimation in the wild.
arXiv Detail & Related papers (2021-03-03T16:51:38Z)
Robust Consistent Video Depth Estimation [65.53308117778361]
We present an algorithm for estimating consistent dense depth maps and camera poses from a monocular video. Our algorithm combines two complementary techniques: (1) flexible deformation-splines for low-frequency large-scale alignment and (2) geometry-aware depth filtering for high-frequency alignment of fine depth details. In contrast to prior approaches, our method does not require camera poses as input and achieves robust reconstruction for challenging hand-held cell phone captures containing a significant amount of noise, shake, motion blur, and rolling shutter deformations.
arXiv Detail & Related papers (2020-12-10T18:59:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.