Accelerated Coordinate Encoding: Learning to Relocalize in Minutes using
RGB and Poses
- URL: http://arxiv.org/abs/2305.14059v1
- Date: Tue, 23 May 2023 13:38:01 GMT
- Title: Accelerated Coordinate Encoding: Learning to Relocalize in Minutes using
RGB and Poses
- Authors: Eric Brachmann, Tommaso Cavallari, Victor Adrian Prisacariu
- Abstract summary: We show how a learning-based relocalization system can achieve the same accuracy in less than 5 minutes.
Our approach is up to 300x faster in mapping than state-of-the-art scene coordinate regression.
- Score: 19.362802419289526
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning-based visual relocalizers exhibit leading pose accuracy, but require
hours or days of training. Since training needs to happen on each new scene
again, long training times make learning-based relocalization impractical for
most applications, despite its promise of high accuracy. In this paper we show
how such a system can actually achieve the same accuracy in less than 5
minutes. We start from the obvious: a relocalization network can be split in a
scene-agnostic feature backbone, and a scene-specific prediction head. Less
obvious: using an MLP prediction head allows us to optimize across thousands of
view points simultaneously in each single training iteration. This leads to
stable and extremely fast convergence. Furthermore, we substitute effective but
slow end-to-end training using a robust pose solver with a curriculum over a
reprojection loss. Our approach does not require privileged knowledge, such a
depth maps or a 3D model, for speedy training. Overall, our approach is up to
300x faster in mapping than state-of-the-art scene coordinate regression, while
keeping accuracy on par.
Related papers
- Map-Relative Pose Regression for Visual Re-Localization [20.89982939633994]
We present a new approach to pose regression, map-relative pose regression (marepo)
We condition the pose regressor on a scene-specific map representation such that its pose predictions are relative to the scene map.
Our approach outperforms previous pose regression methods by far on two public datasets, indoor and outdoor.
arXiv Detail & Related papers (2024-04-15T15:53:23Z) - SACReg: Scene-Agnostic Coordinate Regression for Visual Localization [16.866303169903237]
We propose a generalized SCR model trained once in new test scenes, regardless of their scale, without any finetuning.
Instead of encoding the scene coordinates into the network weights, our model takes as input a database image with some sparse 2D pixel to 3D coordinate annotations.
We show that the database representation of images and their 2D-3D annotations can be highly compressed with negligible loss of localization performance.
arXiv Detail & Related papers (2023-07-21T16:56:36Z) - Effective End-to-End Vision Language Pretraining with Semantic Visual
Loss [58.642954383282216]
Current vision language pretraining models are dominated by methods using region visual features extracted from object detectors.
We introduce three types of visual losses that enable much faster convergence and better finetuning accuracy.
Compared with region feature models, our end-to-end models could achieve similar or better performance on downstream tasks and run more than 10 times faster during inference.
arXiv Detail & Related papers (2023-01-18T00:22:49Z) - Visual Localization via Few-Shot Scene Region Classification [84.34083435501094]
Visual (re)localization addresses the problem of estimating the 6-DoF camera pose of a query image captured in a known scene.
Recent advances in structure-based localization solve this problem by memorizing the mapping from image pixels to scene coordinates.
We propose a scene region classification approach to achieve fast and effective scene memorization with few-shot images.
arXiv Detail & Related papers (2022-08-14T22:39:02Z) - Optimization Planning for 3D ConvNets [123.43419144051703]
It is not trivial to optimally learn a 3D Convolutional Neural Networks (3D ConvNets) due to high complexity and various options of the training scheme.
We decompose the path into a series of training "states" and specify the hyper- parameters, e.g., learning rate and the length of input clips, in each state.
We perform dynamic programming over all the candidate states to plan the optimal permutation of states, i.e., optimization path.
arXiv Detail & Related papers (2022-01-11T16:13:31Z) - Soft Expectation and Deep Maximization for Image Feature Detection [68.8204255655161]
We propose SEDM, an iterative semi-supervised learning process that flips the question and first looks for repeatable 3D points, then trains a detector to localize them in image space.
Our results show that this new model trained using SEDM is able to better localize the underlying 3D points in a scene.
arXiv Detail & Related papers (2021-04-21T00:35:32Z) - Visual Camera Re-Localization Using Graph Neural Networks and Relative
Pose Supervision [31.947525258453584]
Visual re-localization means using a single image as input to estimate the camera's location and orientation relative to a pre-recorded environment.
Our proposed method makes few special assumptions, and is fairly lightweight in training and testing.
We validate the effectiveness of our approach on both standard indoor (7-Scenes) and outdoor (Cambridge Landmarks) camera re-localization benchmarks.
arXiv Detail & Related papers (2021-04-06T14:29:03Z) - Back to the Feature: Learning Robust Camera Localization from Pixels to
Pose [114.89389528198738]
We introduce PixLoc, a scene-agnostic neural network that estimates an accurate 6-DoF pose from an image and a 3D model.
The system can localize in large environments given coarse pose priors but also improve the accuracy of sparse feature matching.
arXiv Detail & Related papers (2021-03-16T17:40:12Z) - Predicting Training Time Without Training [120.92623395389255]
We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function.
We leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model.
We are able to predict the time it takes to fine-tune a model to a given loss without having to perform any training.
arXiv Detail & Related papers (2020-08-28T04:29:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.