Related papers: SACReg: Scene-Agnostic Coordinate Regression for Visual Localization

SACReg: Scene-Agnostic Coordinate Regression for Visual Localization

URL: http://arxiv.org/abs/2307.11702v3
Date: Thu, 30 Nov 2023 11:22:53 GMT
Title: SACReg: Scene-Agnostic Coordinate Regression for Visual Localization
Authors: Jerome Revaud, Yohann Cabon, Romain Br\'egier, JongMin Lee and Philippe Weinzaepfel
Abstract summary: We propose a generalized SCR model trained once in new test scenes, regardless of their scale, without any finetuning. Instead of encoding the scene coordinates into the network weights, our model takes as input a database image with some sparse 2D pixel to 3D coordinate annotations. We show that the database representation of images and their 2D-3D annotations can be highly compressed with negligible loss of localization performance.
Score: 16.866303169903237
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scene coordinates regression (SCR), i.e., predicting 3D coordinates for every pixel of a given image, has recently shown promising potential. However, existing methods remain limited to small scenes memorized during training, and thus hardly scale to realistic datasets and scenarios. In this paper, we propose a generalized SCR model trained once to be deployed in new test scenes, regardless of their scale, without any finetuning. Instead of encoding the scene coordinates into the network weights, our model takes as input a database image with some sparse 2D pixel to 3D coordinate annotations, extracted from e.g. off-the-shelf Structure-from-Motion or RGB-D data, and a query image for which are predicted a dense 3D coordinate map and its confidence, based on cross-attention. At test time, we rely on existing off-the-shelf image retrieval systems and fuse the predictions from a shortlist of relevant database images w.r.t. the query. Afterwards camera pose is obtained using standard Perspective-n-Point (PnP). Starting from selfsupervised CroCo pretrained weights, we train our model on diverse datasets to ensure generalizabilty across various scenarios, and significantly outperform other scene regression approaches, including scene-specific models, on multiple visual localization benchmarks. Finally, we show that the database representation of images and their 2D-3D annotations can be highly compressed with negligible loss of localization performance.

Related papers

Scene Coordinate Reconstruction Priors [29.668748429324154]
Scene coordinate regression (SCR) models have proven to be powerful implicit scene representations for 3D vision.<n>We present a probabilistic reinterpretation of training SCR models, which allows us to infuse high-level reconstruction priors.<n>Our priors help learning better scene representations, resulting in more coherent scene point clouds, higher registration rates and better camera poses.
arXiv Detail & Related papers (2025-10-14T11:13:31Z)
Continuous 3D Perception Model with Persistent State [111.83854602049222]
We present a unified framework capable of solving a broad range of 3D tasks. Our approach features a stateful recurrent model that continuously updates its state representation with each new observation. We evaluate our method on various 3D/4D tasks and demonstrate competitive or state-of-the-art performance in each.
arXiv Detail & Related papers (2025-01-21T18:59:23Z)
No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images. Our model achieves real-time 3D Gaussian reconstruction during inference. This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z)
GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization [1.4466437171584356]
3D Gaussian Splatting (3DGS) allows for the compact encoding of both 3D geometry and scene appearance with its spatial features. We propose distilling dense keypoint descriptors into 3DGS to improve the model's spatial understanding. Our approach surpasses state-of-the-art Neural Render Pose (NRP) methods, including NeRFMatch and PNeRFLoc.
arXiv Detail & Related papers (2024-09-24T23:18:32Z)
GLACE: Global Local Accelerated Coordinate Encoding [66.87005863868181]
Scene coordinate regression methods are effective in small-scale scenes but face significant challenges in large-scale scenes. We propose GLACE, which integrates pre-trained global and local encodings and enables SCR to scale to large scenes with only a single small-sized network. Our method achieves state-of-the-art results on large-scale scenes with a low-map-size model.
arXiv Detail & Related papers (2024-06-06T17:59:50Z)
HSCNet++: Hierarchical Scene Coordinate Classification and Regression for Visual Localization with Transformer [23.920690073252636]
We present a new hierarchical scene coordinate network to predict pixel scene coordinates in a coarse-to-fine manner from a single RGB image. The proposed method, which is an extension of HSCNet, allows us to train compact models which scale robustly to large environments.
arXiv Detail & Related papers (2023-05-05T15:00:14Z)
SGAligner : 3D Scene Alignment with Scene Graphs [84.01002998166145]
Building 3D scene graphs has emerged as a topic in scene representation for several embodied AI applications. We focus on the fundamental problem of aligning pairs of 3D scene graphs whose overlap can range from zero to partial. We propose SGAligner, the first method for aligning pairs of 3D scene graphs that is robust to in-the-wild scenarios.
arXiv Detail & Related papers (2023-04-28T14:39:22Z)
Visual Localization using Imperfect 3D Models from the Internet [54.731309449883284]
This paper studies how imperfections in 3D models affect localization accuracy. We show that 3D models from the Internet show promise as an easy-to-obtain scene representation.
arXiv Detail & Related papers (2023-04-12T16:15:05Z)
Soft Expectation and Deep Maximization for Image Feature Detection [68.8204255655161]
We propose SEDM, an iterative semi-supervised learning process that flips the question and first looks for repeatable 3D points, then trains a detector to localize them in image space. Our results show that this new model trained using SEDM is able to better localize the underlying 3D points in a scene.
arXiv Detail & Related papers (2021-04-21T00:35:32Z)
Learning Camera Localization via Dense Scene Matching [45.0957383562443]
Camera localization aims to estimate 6 DoF camera poses from RGB images. Recent learning-based approaches encode structures into a specific convolutional neural network (CNN) We present a new method for camera localization using dense matching (DSM)
arXiv Detail & Related papers (2021-03-31T03:47:42Z)
Back to the Feature: Learning Robust Camera Localization from Pixels to Pose [114.89389528198738]
We introduce PixLoc, a scene-agnostic neural network that estimates an accurate 6-DoF pose from an image and a 3D model. The system can localize in large environments given coarse pose priors but also improve the accuracy of sparse feature matching.
arXiv Detail & Related papers (2021-03-16T17:40:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.