Leveraging Image Matching Toward End-to-End Relative Camera Pose Regression
- URL: http://arxiv.org/abs/2211.14950v2
- Date: Tue, 16 Apr 2024 12:40:41 GMT
- Title: Leveraging Image Matching Toward End-to-End Relative Camera Pose Regression
- Authors: Fadi Khatib, Yuval Margalit, Meirav Galun, Ronen Basri,
- Abstract summary: This paper proposes a generalizable, end-to-end deep learning-based method for relative pose regression between two images.
Inspired by the classical pipeline, our method leverages Image Matching (IM) as a pre-trained task for relative pose regression.
We evaluate our method on several datasets and show that it outperforms previous end-to-end methods.
- Score: 13.233301155616616
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes a generalizable, end-to-end deep learning-based method for relative pose regression between two images. Given two images of the same scene captured from different viewpoints, our method predicts the relative rotation and translation (including direction and scale) between the two respective cameras. Inspired by the classical pipeline, our method leverages Image Matching (IM) as a pre-trained task for relative pose regression. Specifically, we use LoFTR, an architecture that utilizes an attention-based network pre-trained on Scannet, to extract semi-dense feature maps, which are then warped and fed into a pose regression network. Notably, we use a loss function that utilizes separate terms to account for the translation direction and scale. We believe such a separation is important because translation direction is determined by point correspondences while the scale is inferred from prior on shape sizes. Our ablations further support this choice. We evaluate our method on several datasets and show that it outperforms previous end-to-end methods. The method also generalizes well to unseen datasets.
Related papers
- DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses [59.51874686414509]
Current approaches approximate the continuous pose representation with a large number of discrete pose hypotheses.
We present a Deep Voxel Matching Network (DVMNet) that eliminates the need for pose hypotheses and computes the relative object pose in a single pass.
Our method delivers more accurate relative pose estimates for novel objects at a lower computational cost compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-03-20T15:41:32Z) - FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation [30.710296843150832]
Estimating relative camera poses between images has been a central problem in computer vision.
We show how to combine the best of both methods; our approach yields results that are both precise and robust.
A comprehensive analysis supports our design choices and demonstrates that our method adapts flexibly to various feature extractors and correspondence estimators.
arXiv Detail & Related papers (2024-03-05T18:59:51Z) - RelPose: Predicting Probabilistic Relative Rotation for Single Objects
in the Wild [73.1276968007689]
We describe a data-driven method for inferring the camera viewpoints given multiple images of an arbitrary object.
We show that our approach outperforms state-of-the-art SfM and SLAM methods given sparse images on both seen and unseen categories.
arXiv Detail & Related papers (2022-08-11T17:59:59Z) - Self-Supervised Learning of Image Scale and Orientation [35.94215211409985]
We study the problem of learning to assign a characteristic pose, i.e., scale and orientation, for an image region of interest.
It is hard to obtain a large-scale set of image regions with explicit pose annotations that a model directly learns from.
We propose a self-supervised learning framework with a histogram alignment technique.
arXiv Detail & Related papers (2022-06-15T02:43:39Z) - Semantic keypoint-based pose estimation from single RGB frames [64.80395521735463]
We present an approach to estimating the continuous 6-DoF pose of an object from a single RGB image.
The approach combines semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model.
We show that our approach can accurately recover the 6-DoF object pose for both instance- and class-based scenarios.
arXiv Detail & Related papers (2022-04-12T15:03:51Z) - RELMOBNET: A Robust Two-Stage End-To-End Training Approach For
MOBILENETV3 Based Relative Camera Pose Estimation [0.6193838300896449]
Relative camera pose estimation plays a pivotal role in dealing with 3D reconstruction and visual localization.
We propose a Siamese network based on MobileNetV3-Large for an end-to-end relative camera pose regression independent of camera parameters.
arXiv Detail & Related papers (2022-02-25T17:27:26Z) - Poseur: Direct Human Pose Regression with Transformers [119.79232258661995]
We propose a direct, regression-based approach to 2D human pose estimation from single images.
Our framework is end-to-end differentiable, and naturally learns to exploit the dependencies between keypoints.
Ours is the first regression-based approach to perform favorably compared to the best heatmap-based pose estimation methods.
arXiv Detail & Related papers (2022-01-19T04:31:57Z) - DeepI2P: Image-to-Point Cloud Registration via Deep Classification [71.3121124994105]
DeepI2P is a novel approach for cross-modality registration between an image and a point cloud.
Our method estimates the relative rigid transformation between the coordinate frames of the camera and Lidar.
We circumvent the difficulty by converting the registration problem into a classification and inverse camera projection optimization problem.
arXiv Detail & Related papers (2021-04-08T04:27:32Z) - Visual Camera Re-Localization Using Graph Neural Networks and Relative
Pose Supervision [31.947525258453584]
Visual re-localization means using a single image as input to estimate the camera's location and orientation relative to a pre-recorded environment.
Our proposed method makes few special assumptions, and is fairly lightweight in training and testing.
We validate the effectiveness of our approach on both standard indoor (7-Scenes) and outdoor (Cambridge Landmarks) camera re-localization benchmarks.
arXiv Detail & Related papers (2021-04-06T14:29:03Z) - Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression [81.05772887221333]
We study the dense keypoint regression framework that is previously inferior to the keypoint detection and grouping framework.
We present a simple yet effective approach, named disentangled keypoint regression (DEKR)
We empirically show that the proposed direct regression method outperforms keypoint detection and grouping methods.
arXiv Detail & Related papers (2021-04-06T05:54:46Z) - Paying Attention to Activation Maps in Camera Pose Regression [4.232614032390374]
Camera pose regression methods apply a single forward pass to the query image to estimate the camera pose.
We propose an attention-based approach for pose regression, where the convolutional activation maps are used as sequential inputs.
Our proposed approach is shown to compare favorably to contemporary pose regressors schemes and achieves state-of-the-art accuracy across multiple benchmarks.
arXiv Detail & Related papers (2021-03-21T20:10:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.