SACReg: Scene-Agnostic Coordinate Regression for Visual Localization
- URL: http://arxiv.org/abs/2307.11702v3
- Date: Thu, 30 Nov 2023 11:22:53 GMT
- Title: SACReg: Scene-Agnostic Coordinate Regression for Visual Localization
- Authors: Jerome Revaud, Yohann Cabon, Romain Br\'egier, JongMin Lee and
Philippe Weinzaepfel
- Abstract summary: We propose a generalized SCR model trained once in new test scenes, regardless of their scale, without any finetuning.
Instead of encoding the scene coordinates into the network weights, our model takes as input a database image with some sparse 2D pixel to 3D coordinate annotations.
We show that the database representation of images and their 2D-3D annotations can be highly compressed with negligible loss of localization performance.
- Score: 16.866303169903237
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene coordinates regression (SCR), i.e., predicting 3D coordinates for every
pixel of a given image, has recently shown promising potential. However,
existing methods remain limited to small scenes memorized during training, and
thus hardly scale to realistic datasets and scenarios. In this paper, we
propose a generalized SCR model trained once to be deployed in new test scenes,
regardless of their scale, without any finetuning. Instead of encoding the
scene coordinates into the network weights, our model takes as input a database
image with some sparse 2D pixel to 3D coordinate annotations, extracted from
e.g. off-the-shelf Structure-from-Motion or RGB-D data, and a query image for
which are predicted a dense 3D coordinate map and its confidence, based on
cross-attention. At test time, we rely on existing off-the-shelf image
retrieval systems and fuse the predictions from a shortlist of relevant
database images w.r.t. the query. Afterwards camera pose is obtained using
standard Perspective-n-Point (PnP). Starting from selfsupervised CroCo
pretrained weights, we train our model on diverse datasets to ensure
generalizabilty across various scenarios, and significantly outperform other
scene regression approaches, including scene-specific models, on multiple
visual localization benchmarks. Finally, we show that the database
representation of images and their 2D-3D annotations can be highly compressed
with negligible loss of localization performance.
Related papers
- GLACE: Global Local Accelerated Coordinate Encoding [66.87005863868181]
Scene coordinate regression methods are effective in small-scale scenes but face significant challenges in large-scale scenes.
We propose GLACE, which integrates pre-trained global and local encodings and enables SCR to scale to large scenes with only a single small-sized network.
Our method achieves state-of-the-art results on large-scale scenes with a low-map-size model.
arXiv Detail & Related papers (2024-06-06T17:59:50Z) - Improved Scene Landmark Detection for Camera Localization [11.56648898250606]
Method based on scene landmark detection (SLD) was recently proposed to address these limitations.
It involves training a convolutional neural network (CNN) to detect a few predetermined, salient, scene-specific 3D points or landmarks.
We show that the accuracy gap was due to insufficient model capacity and noisy labels during training.
arXiv Detail & Related papers (2024-01-31T18:59:12Z) - FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - HSCNet++: Hierarchical Scene Coordinate Classification and Regression
for Visual Localization with Transformer [23.920690073252636]
We present a new hierarchical scene coordinate network to predict pixel scene coordinates in a coarse-to-fine manner from a single RGB image.
The proposed method, which is an extension of HSCNet, allows us to train compact models which scale robustly to large environments.
arXiv Detail & Related papers (2023-05-05T15:00:14Z) - SGAligner : 3D Scene Alignment with Scene Graphs [84.01002998166145]
Building 3D scene graphs has emerged as a topic in scene representation for several embodied AI applications.
We focus on the fundamental problem of aligning pairs of 3D scene graphs whose overlap can range from zero to partial.
We propose SGAligner, the first method for aligning pairs of 3D scene graphs that is robust to in-the-wild scenarios.
arXiv Detail & Related papers (2023-04-28T14:39:22Z) - Visual Localization using Imperfect 3D Models from the Internet [54.731309449883284]
This paper studies how imperfections in 3D models affect localization accuracy.
We show that 3D models from the Internet show promise as an easy-to-obtain scene representation.
arXiv Detail & Related papers (2023-04-12T16:15:05Z) - Fast and Lightweight Scene Regressor for Camera Relocalization [1.6708069984516967]
Estimating the camera pose directly with respect to pre-built 3D models can be prohibitively expensive for several applications.
This study proposes a simple scene regression method that requires only a multi-layer perceptron network for mapping scene coordinates.
The proposed approach uses sparse descriptors to regress the scene coordinates, instead of a dense RGB image.
arXiv Detail & Related papers (2022-12-04T14:41:20Z) - Soft Expectation and Deep Maximization for Image Feature Detection [68.8204255655161]
We propose SEDM, an iterative semi-supervised learning process that flips the question and first looks for repeatable 3D points, then trains a detector to localize them in image space.
Our results show that this new model trained using SEDM is able to better localize the underlying 3D points in a scene.
arXiv Detail & Related papers (2021-04-21T00:35:32Z) - Learning Camera Localization via Dense Scene Matching [45.0957383562443]
Camera localization aims to estimate 6 DoF camera poses from RGB images.
Recent learning-based approaches encode structures into a specific convolutional neural network (CNN)
We present a new method for camera localization using dense matching (DSM)
arXiv Detail & Related papers (2021-03-31T03:47:42Z) - Back to the Feature: Learning Robust Camera Localization from Pixels to
Pose [114.89389528198738]
We introduce PixLoc, a scene-agnostic neural network that estimates an accurate 6-DoF pose from an image and a 3D model.
The system can localize in large environments given coarse pose priors but also improve the accuracy of sparse feature matching.
arXiv Detail & Related papers (2021-03-16T17:40:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.