Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences
- URL: http://arxiv.org/abs/2404.06337v1
- Date: Tue, 9 Apr 2024 14:22:50 GMT
- Title: Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences
- Authors: Axel Barroso-Laguna, Sowmya Munukutla, Victor Adrian Prisacariu, Eric Brachmann,
- Abstract summary: Given two images, we can estimate the relative camera pose between them by establishing image-to-image correspondences.
We present MicKey, a keypoint matching pipeline that is able to predict metric correspondences in 3D camera space.
- Score: 21.057940424318314
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Given two images, we can estimate the relative camera pose between them by establishing image-to-image correspondences. Usually, correspondences are 2D-to-2D and the pose we estimate is defined only up to scale. Some applications, aiming at instant augmented reality anywhere, require scale-metric pose estimates, and hence, they rely on external depth estimators to recover the scale. We present MicKey, a keypoint matching pipeline that is able to predict metric correspondences in 3D camera space. By learning to match 3D coordinates across images, we are able to infer the metric relative pose without depth measurements. Depth measurements are also not required for training, nor are scene reconstructions or image overlap information. MicKey is supervised only by pairs of images and their relative poses. MicKey achieves state-of-the-art performance on the Map-Free Relocalisation benchmark while requiring less supervision than competing approaches.
Related papers
- SRPose: Two-view Relative Pose Estimation with Sparse Keypoints [51.49105161103385]
SRPose is a sparse keypoint-based framework for two-view relative pose estimation in camera-to-world and object-to-camera scenarios.
It achieves competitive or superior performance compared to state-of-the-art methods in terms of accuracy and speed.
It is robust to different image sizes and camera intrinsics, and can be deployed with low computing resources.
arXiv Detail & Related papers (2024-07-11T05:46:35Z) - A Single 2D Pose with Context is Worth Hundreds for 3D Human Pose
Estimation [18.72362803593654]
The dominant paradigm in 3D human pose estimation that lifts a 2D pose sequence to 3D heavily relies on long-term temporal clues.
This can be attributed to their inherent inability to perceive spatial context as plain 2D joint coordinates carry no visual cues.
We propose a straightforward yet powerful solution: leveraging the readily available intermediate visual representations produced by off-the-shelf (pre-trained) 2D pose detectors.
arXiv Detail & Related papers (2023-11-06T18:04:13Z) - Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image [85.91935485902708]
We show that the key to a zero-shot single-view metric depth model lies in the combination of large-scale data training and resolving the metric ambiguity from various camera models.
We propose a canonical camera space transformation module, which explicitly addresses the ambiguity problems and can be effortlessly plugged into existing monocular models.
Our method enables the accurate recovery of metric 3D structures on randomly collected internet images.
arXiv Detail & Related papers (2023-07-20T16:14:23Z) - PoseKernelLifter: Metric Lifting of 3D Human Pose using Sound [34.814669331418884]
Reconstructing the 3D pose of a person in metric scale from a single view image is a geometrically ill-posed problem.
We show that audio signals recorded along with an image, provide complementary information to reconstruct the metric 3D pose of the person.
We design a multi-stage 3D CNN that fuses audio and visual signals and learns to reconstruct 3D pose in a metric scale.
arXiv Detail & Related papers (2021-12-01T01:34:56Z) - Category-Level Metric Scale Object Shape and Pose Estimation [73.92460712829188]
We propose a framework that jointly estimates a metric scale shape and pose from a single RGB image.
We validated our method on both synthetic and real-world datasets to evaluate category-level object pose and shape.
arXiv Detail & Related papers (2021-09-01T12:16:46Z) - Weakly-supervised Cross-view 3D Human Pose Estimation [16.045255544594625]
We propose a simple yet effective pipeline for weakly-supervised cross-view 3D human pose estimation.
Our method can achieve state-of-the-art performance in a weakly-supervised manner.
We evaluate our method on the standard benchmark dataset, Human3.6M.
arXiv Detail & Related papers (2021-05-23T08:16:25Z) - MeTRAbs: Metric-Scale Truncation-Robust Heatmaps for Absolute 3D Human
Pose Estimation [16.463390330757132]
We propose metric-scale truncation-robust (MeTRo) volumetric heatmaps, whose dimensions are all defined in metric 3D space, instead of being aligned with image space.
This reinterpretation of heatmap dimensions allows us to directly estimate complete, metric-scale poses without test-time knowledge of distance or relying on anthropometrics, such as bone lengths.
We find that supervision via absolute pose loss is crucial for accurate non-root-relative localization.
arXiv Detail & Related papers (2020-07-12T11:52:09Z) - Leveraging Photometric Consistency over Time for Sparsely Supervised
Hand-Object Reconstruction [118.21363599332493]
We present a method to leverage photometric consistency across time when annotations are only available for a sparse subset of frames in a video.
Our model is trained end-to-end on color images to jointly reconstruct hands and objects in 3D by inferring their poses.
We achieve state-of-the-art results on 3D hand-object reconstruction benchmarks and demonstrate that our approach allows us to improve the pose estimation accuracy.
arXiv Detail & Related papers (2020-04-28T12:03:14Z) - Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A
Geometric Approach [76.10879433430466]
We propose to estimate 3D human pose from multi-view images and a few IMUs attached at person's limbs.
It operates by firstly detecting 2D poses from the two signals, and then lifting them to the 3D space.
The simple two-step approach reduces the error of the state-of-the-art by a large margin on a public dataset.
arXiv Detail & Related papers (2020-03-25T00:26:54Z) - Learning 2D-3D Correspondences To Solve The Blind Perspective-n-Point
Problem [98.92148855291363]
This paper proposes a deep CNN model which simultaneously solves for both 6-DoF absolute camera pose 2D--3D correspondences.
Tests on both real and simulated data have shown that our method substantially outperforms existing approaches.
arXiv Detail & Related papers (2020-03-15T04:17:30Z) - Metric-Scale Truncation-Robust Heatmaps for 3D Human Pose Estimation [16.463390330757132]
We propose metric-scale truncation-robust volumetric heatmaps, whose dimensions are defined in metric 3D space near the subject.
We train a fully-convolutional network to estimate such heatmaps from monocular RGB in an end-to-end manner.
As our method is simple and fast, it can become a useful component for real-time top-down multi-person pose estimation systems.
arXiv Detail & Related papers (2020-03-05T22:38:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.