Learning to Localize in Unseen Scenes with Relative Pose Regressors
- URL: http://arxiv.org/abs/2303.02717v1
- Date: Sun, 5 Mar 2023 17:12:50 GMT
- Title: Learning to Localize in Unseen Scenes with Relative Pose Regressors
- Authors: Ofer Idan, Yoli Shavit, Yosi Keller
- Abstract summary: Relative pose regressors (RPRs) localize a camera by estimating its relative translation and rotation to a pose-labelled reference.
In practice, however, the performance of RPRs is significantly degraded in unseen scenes.
We implement aggregation with concatenation, projection, and attention operations (Transformers) and learn to regress the relative pose parameters from the resulting latent codes.
Compared to state-of-the-art RPRs, our model is shown to localize significantly better in unseen environments, across both indoor and outdoor benchmarks, while maintaining competitive performance in seen scenes.
- Score: 5.672132510411465
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Relative pose regressors (RPRs) localize a camera by estimating its relative
translation and rotation to a pose-labelled reference. Unlike scene coordinate
regression and absolute pose regression methods, which learn absolute scene
parameters, RPRs can (theoretically) localize in unseen environments, since
they only learn the residual pose between camera pairs. In practice, however,
the performance of RPRs is significantly degraded in unseen scenes. In this
work, we propose to aggregate paired feature maps into latent codes, instead of
operating on global image descriptors, in order to improve the generalization
of RPRs. We implement aggregation with concatenation, projection, and attention
operations (Transformer Encoders) and learn to regress the relative pose
parameters from the resulting latent codes. We further make use of a recently
proposed continuous representation of rotation matrices, which alleviates the
limitations of the commonly used quaternions. Compared to state-of-the-art
RPRs, our model is shown to localize significantly better in unseen
environments, across both indoor and outdoor benchmarks, while maintaining
competitive performance in seen scenes. We validate our findings and
architecture design through multiple ablations. Our code and pretrained models
is publicly available.
Related papers
- SRPose: Two-view Relative Pose Estimation with Sparse Keypoints [51.49105161103385]
SRPose is a sparse keypoint-based framework for two-view relative pose estimation in camera-to-world and object-to-camera scenarios.
It achieves competitive or superior performance compared to state-of-the-art methods in terms of accuracy and speed.
It is robust to different image sizes and camera intrinsics, and can be deployed with low computing resources.
arXiv Detail & Related papers (2024-07-11T05:46:35Z) - Cameras as Rays: Pose Estimation via Ray Diffusion [54.098613859015856]
Estimating camera poses is a fundamental task for 3D reconstruction and remains challenging given sparsely sampled views.
We propose a distributed representation of camera pose that treats a camera as a bundle of rays.
Our proposed methods, both regression- and diffusion-based, demonstrate state-of-the-art performance on camera pose estimation on CO3D.
arXiv Detail & Related papers (2024-02-22T18:59:56Z) - CoPR: Towards Accurate Visual Localization With Continuous
Place-descriptor Regression [2.7393821783237184]
Visual Place Recognition (VPR) estimates the camera location of a query image by retrieving the most similar reference image from a map of geo-tagged reference images.
References for VPR are only available at sparse poses in a map, which enforces an upper bound on the maximum achievable localization accuracy.
We propose Continuous Place-descriptor Regression (CoPR) to densify the map and improve localization accuracy.
arXiv Detail & Related papers (2023-04-14T23:17:44Z) - Fusing Structure from Motion and Simulation-Augmented Pose Regression from Optical Flow for Challenging Indoor Environments [13.654208446015824]
The localization of objects is a crucial task in various applications such as robotics, virtual and augmented reality, and the transportation of goods in warehouses.
Recent advances in deep learning have enabled the localization using monocular visual cameras.
This study aims to address these challenges by incorporating additional information and regularizing the absolute pose using relative pose regression (RPR) methods.
arXiv Detail & Related papers (2023-04-14T16:58:23Z) - RelPose: Predicting Probabilistic Relative Rotation for Single Objects
in the Wild [73.1276968007689]
We describe a data-driven method for inferring the camera viewpoints given multiple images of an arbitrary object.
We show that our approach outperforms state-of-the-art SfM and SLAM methods given sparse images on both seen and unseen categories.
arXiv Detail & Related papers (2022-08-11T17:59:59Z) - Camera Pose Auto-Encoders for Improving Pose Regression [6.700873164609009]
We introduce Camera Pose Auto-Encoders (PAEs) to encode camera poses using APRs as their teachers.
We show that the resulting latent pose representations can closely reproduce APR performance and demonstrate their effectiveness for related tasks.
We also show that train images can be reconstructed from the learned pose encoding, paving the way for integrating visual information from the train set at a low memory cost.
arXiv Detail & Related papers (2022-07-12T13:47:36Z) - ImPosIng: Implicit Pose Encoding for Efficient Camera Pose Estimation [2.6808541153140077]
Implicit Pose.
(ImPosing) embeds images and camera poses into a common latent representation with 2 separate neural networks.
By evaluating candidates through the latent space in a hierarchical manner, the camera position and orientation are not directly regressed but refined.
arXiv Detail & Related papers (2022-05-05T13:33:25Z) - On the Limits of Pseudo Ground Truth in Visual Camera Re-localisation [83.29404673257328]
Re-localisation benchmarks measure how well each method replicates the results of a reference algorithm.
This begs the question whether the choice of the reference algorithm favours a certain family of re-localisation methods.
This paper analyzes two widely used re-localisation datasets and shows that evaluation outcomes indeed vary with the choice of the reference algorithm.
arXiv Detail & Related papers (2021-09-01T12:01:08Z) - Visual Camera Re-Localization Using Graph Neural Networks and Relative
Pose Supervision [31.947525258453584]
Visual re-localization means using a single image as input to estimate the camera's location and orientation relative to a pre-recorded environment.
Our proposed method makes few special assumptions, and is fairly lightweight in training and testing.
We validate the effectiveness of our approach on both standard indoor (7-Scenes) and outdoor (Cambridge Landmarks) camera re-localization benchmarks.
arXiv Detail & Related papers (2021-04-06T14:29:03Z) - Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression [81.05772887221333]
We study the dense keypoint regression framework that is previously inferior to the keypoint detection and grouping framework.
We present a simple yet effective approach, named disentangled keypoint regression (DEKR)
We empirically show that the proposed direct regression method outperforms keypoint detection and grouping methods.
arXiv Detail & Related papers (2021-04-06T05:54:46Z) - Cross-Scale Internal Graph Neural Network for Image Super-Resolution [147.77050877373674]
Non-local self-similarity in natural images has been well studied as an effective prior in image restoration.
For single image super-resolution (SISR), most existing deep non-local methods only exploit similar patches within the same scale of the low-resolution (LR) input image.
This is achieved using a novel cross-scale internal graph neural network (IGNN)
arXiv Detail & Related papers (2020-06-30T10:48:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.