Visual Camera Re-Localization Using Graph Neural Networks and Relative
Pose Supervision
- URL: http://arxiv.org/abs/2104.02538v1
- Date: Tue, 6 Apr 2021 14:29:03 GMT
- Title: Visual Camera Re-Localization Using Graph Neural Networks and Relative
Pose Supervision
- Authors: Mehmet Ozgur Turkoglu, Eric Brachmann, Konrad Schindler, Gabriel
Brostow, Aron Monszpart
- Abstract summary: Visual re-localization means using a single image as input to estimate the camera's location and orientation relative to a pre-recorded environment.
Our proposed method makes few special assumptions, and is fairly lightweight in training and testing.
We validate the effectiveness of our approach on both standard indoor (7-Scenes) and outdoor (Cambridge Landmarks) camera re-localization benchmarks.
- Score: 31.947525258453584
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual re-localization means using a single image as input to estimate the
camera's location and orientation relative to a pre-recorded environment. The
highest-scoring methods are "structure based," and need the query camera's
intrinsics as an input to the model, with careful geometric optimization. When
intrinsics are absent, methods vie for accuracy by making various other
assumptions. This yields fairly good localization scores, but the models are
"narrow" in some way, eg., requiring costly test-time computations, or depth
sensors, or multiple query frames. In contrast, our proposed method makes few
special assumptions, and is fairly lightweight in training and testing.
Our pose regression network learns from only relative poses of training
scenes. For inference, it builds a graph connecting the query image to training
counterparts and uses a graph neural network (GNN) with image representations
on nodes and image-pair representations on edges. By efficiently passing
messages between them, both representation types are refined to produce a
consistent camera pose estimate. We validate the effectiveness of our approach
on both standard indoor (7-Scenes) and outdoor (Cambridge Landmarks) camera
re-localization benchmarks. Our relative pose regression method matches the
accuracy of absolute pose regression networks, while retaining the
relative-pose models' test-time speed and ability to generalize to non-training
scenes.
Related papers
- No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images.
Our model achieves real-time 3D Gaussian reconstruction during inference.
This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z) - SRPose: Two-view Relative Pose Estimation with Sparse Keypoints [51.49105161103385]
SRPose is a sparse keypoint-based framework for two-view relative pose estimation in camera-to-world and object-to-camera scenarios.
It achieves competitive or superior performance compared to state-of-the-art methods in terms of accuracy and speed.
It is robust to different image sizes and camera intrinsics, and can be deployed with low computing resources.
arXiv Detail & Related papers (2024-07-11T05:46:35Z) - PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching [51.142988196855484]
We propose PoseMatcher, an accurate model free one-shot object pose estimator.
We create a new training pipeline for object to image matching based on a three-view system.
To enable PoseMatcher to attend to distinct input modalities, an image and a pointcloud, we introduce IO-Layer.
arXiv Detail & Related papers (2023-04-03T21:14:59Z) - Learning to Localize in Unseen Scenes with Relative Pose Regressors [5.672132510411465]
Relative pose regressors (RPRs) localize a camera by estimating its relative translation and rotation to a pose-labelled reference.
In practice, however, the performance of RPRs is significantly degraded in unseen scenes.
We implement aggregation with concatenation, projection, and attention operations (Transformers) and learn to regress the relative pose parameters from the resulting latent codes.
Compared to state-of-the-art RPRs, our model is shown to localize significantly better in unseen environments, across both indoor and outdoor benchmarks, while maintaining competitive performance in seen scenes.
arXiv Detail & Related papers (2023-03-05T17:12:50Z) - A Probabilistic Framework for Visual Localization in Ambiguous Scenes [64.13544430239267]
We propose a probabilistic framework that for a given image predicts the arbitrarily shaped posterior distribution of its camera pose.
We do this via a novel formulation of camera pose regression using variational inference, which allows sampling from the predicted distribution.
Our method outperforms existing methods on localization in ambiguous scenes.
arXiv Detail & Related papers (2023-01-05T14:46:54Z) - Camera Pose Auto-Encoders for Improving Pose Regression [6.700873164609009]
We introduce Camera Pose Auto-Encoders (PAEs) to encode camera poses using APRs as their teachers.
We show that the resulting latent pose representations can closely reproduce APR performance and demonstrate their effectiveness for related tasks.
We also show that train images can be reconstructed from the learned pose encoding, paving the way for integrating visual information from the train set at a low memory cost.
arXiv Detail & Related papers (2022-07-12T13:47:36Z) - ImPosIng: Implicit Pose Encoding for Efficient Camera Pose Estimation [2.6808541153140077]
Implicit Pose.
(ImPosing) embeds images and camera poses into a common latent representation with 2 separate neural networks.
By evaluating candidates through the latent space in a hierarchical manner, the camera position and orientation are not directly regressed but refined.
arXiv Detail & Related papers (2022-05-05T13:33:25Z) - Fusing Convolutional Neural Network and Geometric Constraint for
Image-based Indoor Localization [4.071875179293035]
This paper proposes a new image-based localization framework that explicitly localizes the camera/robot.
The camera is localized using a single or few observed images and training images with 6-degree-of-freedom pose labels.
Experiments on simulation and real data sets demonstrate the efficiency of our proposed method.
arXiv Detail & Related papers (2022-01-05T02:04:41Z) - DeepI2P: Image-to-Point Cloud Registration via Deep Classification [71.3121124994105]
DeepI2P is a novel approach for cross-modality registration between an image and a point cloud.
Our method estimates the relative rigid transformation between the coordinate frames of the camera and Lidar.
We circumvent the difficulty by converting the registration problem into a classification and inverse camera projection optimization problem.
arXiv Detail & Related papers (2021-04-08T04:27:32Z) - Back to the Feature: Learning Robust Camera Localization from Pixels to
Pose [114.89389528198738]
We introduce PixLoc, a scene-agnostic neural network that estimates an accurate 6-DoF pose from an image and a 3D model.
The system can localize in large environments given coarse pose priors but also improve the accuracy of sparse feature matching.
arXiv Detail & Related papers (2021-03-16T17:40:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.