Paying Attention to Activation Maps in Camera Pose Regression
- URL: http://arxiv.org/abs/2103.11477v1
- Date: Sun, 21 Mar 2021 20:10:15 GMT
- Title: Paying Attention to Activation Maps in Camera Pose Regression
- Authors: Yoli Shavit, Ron Ferens, Yosi Keller
- Abstract summary: Camera pose regression methods apply a single forward pass to the query image to estimate the camera pose.
We propose an attention-based approach for pose regression, where the convolutional activation maps are used as sequential inputs.
Our proposed approach is shown to compare favorably to contemporary pose regressors schemes and achieves state-of-the-art accuracy across multiple benchmarks.
- Score: 4.232614032390374
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Camera pose regression methods apply a single forward pass to the query image
to estimate the camera pose. As such, they offer a fast and light-weight
alternative to traditional localization schemes based on image retrieval. Pose
regression approaches simultaneously learn two regression tasks, aiming to
jointly estimate the camera position and orientation using a single embedding
vector computed by a convolutional backbone. We propose an attention-based
approach for pose regression, where the convolutional activation maps are used
as sequential inputs. Transformers are applied to encode the sequential
activation maps as latent vectors, used for camera pose regression. This allows
us to pay attention to spatially-varying deep features. Using two Transformer
heads, we separately focus on the features for camera position and orientation,
based on how informative they are per task. Our proposed approach is shown to
compare favorably to contemporary pose regressors schemes and achieves
state-of-the-art accuracy across multiple outdoor and indoor benchmarks. In
particular, to the best of our knowledge, our approach is the only method to
attain sub-meter average accuracy across outdoor scenes. We make our code
publicly available from here.
Related papers
- SRPose: Two-view Relative Pose Estimation with Sparse Keypoints [51.49105161103385]
SRPose is a sparse keypoint-based framework for two-view relative pose estimation in camera-to-world and object-to-camera scenarios.
It achieves competitive or superior performance compared to state-of-the-art methods in terms of accuracy and speed.
It is robust to different image sizes and camera intrinsics, and can be deployed with low computing resources.
arXiv Detail & Related papers (2024-07-11T05:46:35Z) - Cameras as Rays: Pose Estimation via Ray Diffusion [54.098613859015856]
Estimating camera poses is a fundamental task for 3D reconstruction and remains challenging given sparsely sampled views.
We propose a distributed representation of camera pose that treats a camera as a bundle of rays.
Our proposed methods, both regression- and diffusion-based, demonstrate state-of-the-art performance on camera pose estimation on CO3D.
arXiv Detail & Related papers (2024-02-22T18:59:56Z) - Coarse-to-Fine Multi-Scene Pose Regression with Transformers [19.927662512903915]
A convolutional backbone with a multi-layer perceptron (MLP) head is trained using images and pose labels to embed a single reference at a time.
We propose to learn multi-scene absolute camera pose regression with Transformers, where encoders are used to aggregate activation maps with self-attention.
Our method is evaluated on commonly benchmark indoor and outdoor datasets and has been shown to exceed both multi-scene and state-of-the-art single-scene absolute pose regressors.
arXiv Detail & Related papers (2023-08-22T20:43:31Z) - Leveraging Image Matching Toward End-to-End Relative Camera Pose Regression [13.233301155616616]
This paper proposes a generalizable, end-to-end deep learning-based method for relative pose regression between two images.
Inspired by the classical pipeline, our method leverages Image Matching (IM) as a pre-trained task for relative pose regression.
We evaluate our method on several datasets and show that it outperforms previous end-to-end methods.
arXiv Detail & Related papers (2022-11-27T22:01:47Z) - Camera Pose Auto-Encoders for Improving Pose Regression [6.700873164609009]
We introduce Camera Pose Auto-Encoders (PAEs) to encode camera poses using APRs as their teachers.
We show that the resulting latent pose representations can closely reproduce APR performance and demonstrate their effectiveness for related tasks.
We also show that train images can be reconstructed from the learned pose encoding, paving the way for integrating visual information from the train set at a low memory cost.
arXiv Detail & Related papers (2022-07-12T13:47:36Z) - ImPosIng: Implicit Pose Encoding for Efficient Camera Pose Estimation [2.6808541153140077]
Implicit Pose.
(ImPosing) embeds images and camera poses into a common latent representation with 2 separate neural networks.
By evaluating candidates through the latent space in a hierarchical manner, the camera position and orientation are not directly regressed but refined.
arXiv Detail & Related papers (2022-05-05T13:33:25Z) - Poseur: Direct Human Pose Regression with Transformers [119.79232258661995]
We propose a direct, regression-based approach to 2D human pose estimation from single images.
Our framework is end-to-end differentiable, and naturally learns to exploit the dependencies between keypoints.
Ours is the first regression-based approach to perform favorably compared to the best heatmap-based pose estimation methods.
arXiv Detail & Related papers (2022-01-19T04:31:57Z) - DeepI2P: Image-to-Point Cloud Registration via Deep Classification [71.3121124994105]
DeepI2P is a novel approach for cross-modality registration between an image and a point cloud.
Our method estimates the relative rigid transformation between the coordinate frames of the camera and Lidar.
We circumvent the difficulty by converting the registration problem into a classification and inverse camera projection optimization problem.
arXiv Detail & Related papers (2021-04-08T04:27:32Z) - Visual Camera Re-Localization Using Graph Neural Networks and Relative
Pose Supervision [31.947525258453584]
Visual re-localization means using a single image as input to estimate the camera's location and orientation relative to a pre-recorded environment.
Our proposed method makes few special assumptions, and is fairly lightweight in training and testing.
We validate the effectiveness of our approach on both standard indoor (7-Scenes) and outdoor (Cambridge Landmarks) camera re-localization benchmarks.
arXiv Detail & Related papers (2021-04-06T14:29:03Z) - Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression [81.05772887221333]
We study the dense keypoint regression framework that is previously inferior to the keypoint detection and grouping framework.
We present a simple yet effective approach, named disentangled keypoint regression (DEKR)
We empirically show that the proposed direct regression method outperforms keypoint detection and grouping methods.
arXiv Detail & Related papers (2021-04-06T05:54:46Z) - Bottom-Up Human Pose Estimation by Ranking Heatmap-Guided Adaptive
Keypoint Estimates [76.51095823248104]
We present several schemes that are rarely or unthoroughly studied before for improving keypoint detection and grouping (keypoint regression) performance.
First, we exploit the keypoint heatmaps for pixel-wise keypoint regression instead of separating them for improving keypoint regression.
Second, we adopt a pixel-wise spatial transformer network to learn adaptive representations for handling the scale and orientation variance.
Third, we present a joint shape and heatvalue scoring scheme to promote the estimated poses that are more likely to be true poses.
arXiv Detail & Related papers (2020-06-28T01:14:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.