Map-free Visual Relocalization: Metric Pose Relative to a Single Image
- URL: http://arxiv.org/abs/2210.05494v1
- Date: Tue, 11 Oct 2022 14:49:49 GMT
- Title: Map-free Visual Relocalization: Metric Pose Relative to a Single Image
- Authors: Eduardo Arnold, Jamie Wynn, Sara Vicente, Guillermo Garcia-Hernando,
\'Aron Monszpart, Victor Adrian Prisacariu, Daniyar Turmukhambetov, Eric
Brachmann
- Abstract summary: We propose Map-free Relocalization, using only one photo of a scene to enable instant, metric scaled relocalization.
Existing datasets are not suitable to benchmark map-free relocalization, due to their focus on large scenes or their limited variability.
We have constructed a new dataset of 655 small places of interest, such as sculptures, murals and fountains, collected worldwide.
- Score: 21.28513803531557
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Can we relocalize in a scene represented by a single reference image?
Standard visual relocalization requires hundreds of images and scale
calibration to build a scene-specific 3D map. In contrast, we propose Map-free
Relocalization, i.e., using only one photo of a scene to enable instant, metric
scaled relocalization. Existing datasets are not suitable to benchmark map-free
relocalization, due to their focus on large scenes or their limited
variability. Thus, we have constructed a new dataset of 655 small places of
interest, such as sculptures, murals and fountains, collected worldwide. Each
place comes with a reference image to serve as a relocalization anchor, and
dozens of query images with known, metric camera poses. The dataset features
changing conditions, stark viewpoint changes, high variability across places,
and queries with low to no visual overlap with the reference image. We identify
two viable families of existing methods to provide baseline results: relative
pose regression, and feature matching combined with single-image depth
prediction. While these methods show reasonable performance on some favorable
scenes in our dataset, map-free relocalization proves to be a challenge that
requires new, innovative solutions.
Related papers
- Breaking the Frame: Visual Place Recognition by Overlap Prediction [53.17564423756082]
We propose a novel visual place recognition approach based on overlap prediction, called VOP.
VOP proceeds co-visible image sections by obtaining patch-level embeddings using a Vision Transformer backbone.
Our approach uses a voting mechanism to assess overlap scores for potential database images.
arXiv Detail & Related papers (2024-06-23T20:00:20Z) - LoCUS: Learning Multiscale 3D-consistent Features from Posed Images [18.648772607057175]
We train a versatile neural representation without supervision.
We find that it is possible to balance retrieval and reusability by constructing a retrieval set carefully.
We show results creating sparse, multi-scale, semantic spatial maps.
arXiv Detail & Related papers (2023-10-02T11:11:23Z) - SACReg: Scene-Agnostic Coordinate Regression for Visual Localization [16.866303169903237]
We propose a generalized SCR model trained once in new test scenes, regardless of their scale, without any finetuning.
Instead of encoding the scene coordinates into the network weights, our model takes as input a database image with some sparse 2D pixel to 3D coordinate annotations.
We show that the database representation of images and their 2D-3D annotations can be highly compressed with negligible loss of localization performance.
arXiv Detail & Related papers (2023-07-21T16:56:36Z) - Lazy Visual Localization via Motion Averaging [89.8709956317671]
We show that it is possible to achieve high localization accuracy without reconstructing the scene from the database.
Experiments show that our visual localization proposal, LazyLoc, achieves comparable performance against state-of-the-art structure-based methods.
arXiv Detail & Related papers (2023-07-19T13:40:45Z) - MeshLoc: Mesh-Based Visual Localization [54.731309449883284]
We explore a more flexible alternative based on dense 3D meshes that does not require features matching between database images to build the scene representation.
Surprisingly competitive results can be obtained when extracting features on renderings of these meshes, without any neural rendering stage.
Our results show that dense 3D model-based representations are a promising alternative to existing representations and point to interesting and challenging directions for future research.
arXiv Detail & Related papers (2022-07-21T21:21:10Z) - Investigating the Role of Image Retrieval for Visual Localization -- An
exhaustive benchmark [46.166955777187816]
This paper focuses on understanding the role of image retrieval for multiple visual localization paradigms.
We introduce a novel benchmark setup and compare state-of-the-art retrieval representations on multiple datasets.
Using these tools and in-depth analysis, we show that retrieval performance on classical landmark retrieval or place recognition tasks correlates only for some but not all paradigms to localization performance.
arXiv Detail & Related papers (2022-05-31T12:59:01Z) - VS-Net: Voting with Segmentation for Visual Localization [72.8165619061249]
We propose a novel visual localization framework that establishes 2D-to-3D correspondences between the query image and the 3D map with a series of learnable scene-specific landmarks.
Our proposed VS-Net is extensively tested on multiple public benchmarks and can outperform state-of-the-art visual localization methods.
arXiv Detail & Related papers (2021-05-23T08:44:11Z) - Cross-Descriptor Visual Localization and Mapping [81.16435356103133]
Visual localization and mapping is the key technology underlying the majority of Mixed Reality and robotics systems.
We present three novel scenarios for localization and mapping which require the continuous update of feature representations.
Our data-driven approach is agnostic to the feature descriptor type, has low computational requirements, and scales linearly with the number of description algorithms.
arXiv Detail & Related papers (2020-12-02T18:19:51Z) - VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval [19.239311087570318]
Cross-view image geo-localization aims to determine the locations of street-view query images by matching with GPS-tagged reference images from aerial view.
Recent works have achieved surprisingly high retrieval accuracy on city-scale datasets.
We propose a new large-scale benchmark -- VIGOR -- for cross-View Image Geo-localization beyond One-to-one Retrieval.
arXiv Detail & Related papers (2020-11-24T15:50:54Z) - Geometrically Mappable Image Features [85.81073893916414]
Vision-based localization of an agent in a map is an important problem in robotics and computer vision.
We propose a method that learns image features targeted for image-retrieval-based localization.
arXiv Detail & Related papers (2020-03-21T15:36:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.