Related papers: Visual Camera Re-Localization Using Graph Neural Networks and Relative Pose Supervision

Visual Camera Re-Localization Using Graph Neural Networks and Relative Pose Supervision

URL: http://arxiv.org/abs/2104.02538v1
Date: Tue, 6 Apr 2021 14:29:03 GMT
Title: Visual Camera Re-Localization Using Graph Neural Networks and Relative Pose Supervision
Authors: Mehmet Ozgur Turkoglu, Eric Brachmann, Konrad Schindler, Gabriel Brostow, Aron Monszpart
Abstract summary: Visual re-localization means using a single image as input to estimate the camera's location and orientation relative to a pre-recorded environment. Our proposed method makes few special assumptions, and is fairly lightweight in training and testing. We validate the effectiveness of our approach on both standard indoor (7-Scenes) and outdoor (Cambridge Landmarks) camera re-localization benchmarks.
Score: 31.947525258453584
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visual re-localization means using a single image as input to estimate the camera's location and orientation relative to a pre-recorded environment. The highest-scoring methods are "structure based," and need the query camera's intrinsics as an input to the model, with careful geometric optimization. When intrinsics are absent, methods vie for accuracy by making various other assumptions. This yields fairly good localization scores, but the models are "narrow" in some way, eg., requiring costly test-time computations, or depth sensors, or multiple query frames. In contrast, our proposed method makes few special assumptions, and is fairly lightweight in training and testing. Our pose regression network learns from only relative poses of training scenes. For inference, it builds a graph connecting the query image to training counterparts and uses a graph neural network (GNN) with image representations on nodes and image-pair representations on edges. By efficiently passing messages between them, both representation types are refined to produce a consistent camera pose estimate. We validate the effectiveness of our approach on both standard indoor (7-Scenes) and outdoor (Cambridge Landmarks) camera re-localization benchmarks. Our relative pose regression method matches the accuracy of absolute pose regression networks, while retaining the relative-pose models' test-time speed and ability to generalize to non-training scenes.

Related papers

A Guide to Structureless Visual Localization [63.41481414949785]
Methods that estimate the camera pose of a query image in a known scene are core components of many applications, including self-driving cars and augmented / mixed reality systems. State-of-the-art visual localization algorithms are structure-based, i.e., they store a 3D model of the scene and use 2D-3D correspondences between the query image and 3D points in the model for camera pose estimation. This paper is dedicated to providing, to the best of our knowledge, first comprehensive discussion and comparison of structureless methods.
arXiv Detail & Related papers (2025-04-24T15:08:36Z)
Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization [25.772574727405825]
Visual localization aims to determine the camera pose of a query image relative to a database of posed images. Deep neural networks that directly regress camera poses have gained popularity due to their fast inference capabilities. We present Reloc3r, a simple yet effective visual localization framework.
arXiv Detail & Related papers (2024-12-11T13:36:18Z)
No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images. Our model achieves real-time 3D Gaussian reconstruction during inference. This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z)
SRPose: Two-view Relative Pose Estimation with Sparse Keypoints [51.49105161103385]
SRPose is a sparse keypoint-based framework for two-view relative pose estimation in camera-to-world and object-to-camera scenarios. It achieves competitive or superior performance compared to state-of-the-art methods in terms of accuracy and speed. It is robust to different image sizes and camera intrinsics, and can be deployed with low computing resources.
arXiv Detail & Related papers (2024-07-11T05:46:35Z)
PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching [51.142988196855484]
We propose PoseMatcher, an accurate model free one-shot object pose estimator. We create a new training pipeline for object to image matching based on a three-view system. To enable PoseMatcher to attend to distinct input modalities, an image and a pointcloud, we introduce IO-Layer.
arXiv Detail & Related papers (2023-04-03T21:14:59Z)
Learning to Localize in Unseen Scenes with Relative Pose Regressors [5.672132510411465]
Relative pose regressors (RPRs) localize a camera by estimating its relative translation and rotation to a pose-labelled reference. In practice, however, the performance of RPRs is significantly degraded in unseen scenes. We implement aggregation with concatenation, projection, and attention operations (Transformers) and learn to regress the relative pose parameters from the resulting latent codes. Compared to state-of-the-art RPRs, our model is shown to localize significantly better in unseen environments, across both indoor and outdoor benchmarks, while maintaining competitive performance in seen scenes.
arXiv Detail & Related papers (2023-03-05T17:12:50Z)
A Probabilistic Framework for Visual Localization in Ambiguous Scenes [64.13544430239267]
We propose a probabilistic framework that for a given image predicts the arbitrarily shaped posterior distribution of its camera pose. We do this via a novel formulation of camera pose regression using variational inference, which allows sampling from the predicted distribution. Our method outperforms existing methods on localization in ambiguous scenes.
arXiv Detail & Related papers (2023-01-05T14:46:54Z)
Camera Pose Auto-Encoders for Improving Pose Regression [6.700873164609009]
We introduce Camera Pose Auto-Encoders (PAEs) to encode camera poses using APRs as their teachers. We show that the resulting latent pose representations can closely reproduce APR performance and demonstrate their effectiveness for related tasks. We also show that train images can be reconstructed from the learned pose encoding, paving the way for integrating visual information from the train set at a low memory cost.
arXiv Detail & Related papers (2022-07-12T13:47:36Z)
ImPosIng: Implicit Pose Encoding for Efficient Camera Pose Estimation [2.6808541153140077]
Implicit Pose. (ImPosing) embeds images and camera poses into a common latent representation with 2 separate neural networks. By evaluating candidates through the latent space in a hierarchical manner, the camera position and orientation are not directly regressed but refined.
arXiv Detail & Related papers (2022-05-05T13:33:25Z)
Fusing Convolutional Neural Network and Geometric Constraint for Image-based Indoor Localization [4.071875179293035]
This paper proposes a new image-based localization framework that explicitly localizes the camera/robot. The camera is localized using a single or few observed images and training images with 6-degree-of-freedom pose labels. Experiments on simulation and real data sets demonstrate the efficiency of our proposed method.
arXiv Detail & Related papers (2022-01-05T02:04:41Z)
DeepI2P: Image-to-Point Cloud Registration via Deep Classification [71.3121124994105]
DeepI2P is a novel approach for cross-modality registration between an image and a point cloud. Our method estimates the relative rigid transformation between the coordinate frames of the camera and Lidar. We circumvent the difficulty by converting the registration problem into a classification and inverse camera projection optimization problem.
arXiv Detail & Related papers (2021-04-08T04:27:32Z)
Back to the Feature: Learning Robust Camera Localization from Pixels to Pose [114.89389528198738]
We introduce PixLoc, a scene-agnostic neural network that estimates an accurate 6-DoF pose from an image and a 3D model. The system can localize in large environments given coarse pose priors but also improve the accuracy of sparse feature matching.
arXiv Detail & Related papers (2021-03-16T17:40:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.