Investigating the Role of Image Retrieval for Visual Localization -- An
exhaustive benchmark
- URL: http://arxiv.org/abs/2205.15761v1
- Date: Tue, 31 May 2022 12:59:01 GMT
- Title: Investigating the Role of Image Retrieval for Visual Localization -- An
exhaustive benchmark
- Authors: Martin Humenberger and Yohann Cabon and No\'e Pion and Philippe
Weinzaepfel and Donghwan Lee and Nicolas Gu\'erin and Torsten Sattler and
Gabriela Csurka
- Abstract summary: This paper focuses on understanding the role of image retrieval for multiple visual localization paradigms.
We introduce a novel benchmark setup and compare state-of-the-art retrieval representations on multiple datasets.
Using these tools and in-depth analysis, we show that retrieval performance on classical landmark retrieval or place recognition tasks correlates only for some but not all paradigms to localization performance.
- Score: 46.166955777187816
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual localization, i.e., camera pose estimation in a known scene, is a core
component of technologies such as autonomous driving and augmented reality.
State-of-the-art localization approaches often rely on image retrieval
techniques for one of two purposes: (1) provide an approximate pose estimate or
(2) determine which parts of the scene are potentially visible in a given query
image. It is common practice to use state-of-the-art image retrieval algorithms
for both of them. These algorithms are often trained for the goal of retrieving
the same landmark under a large range of viewpoint changes which often differs
from the requirements of visual localization. In order to investigate the
consequences for visual localization, this paper focuses on understanding the
role of image retrieval for multiple visual localization paradigms. First, we
introduce a novel benchmark setup and compare state-of-the-art retrieval
representations on multiple datasets using localization performance as metric.
Second, we investigate several definitions of "ground truth" for image
retrieval. Using these definitions as upper bounds for the visual localization
paradigms, we show that there is still sgnificant room for improvement. Third,
using these tools and in-depth analysis, we show that retrieval performance on
classical landmark retrieval or place recognition tasks correlates only for
some but not all paradigms to localization performance. Finally, we analyze the
effects of blur and dynamic scenes in the images. We conclude that there is a
need for retrieval approaches specifically designed for localization paradigms.
Our benchmark and evaluation protocols are available at
https://github.com/naver/kapture-localization.
Related papers
- Teaching VLMs to Localize Specific Objects from In-context Examples [56.797110842152]
Vision-Language Models (VLMs) have shown remarkable capabilities across diverse visual tasks.
Current VLMs lack a fundamental cognitive ability: learning to localize objects in a scene by taking into account the context.
This work is the first to explore and benchmark personalized few-shot localization for VLMs.
arXiv Detail & Related papers (2024-11-20T13:34:22Z) - Revisit Anything: Visual Place Recognition via Image Segment Retrieval [8.544326445217369]
Existing visual place recognition pipelines encode the "whole" image and search for matches.
We address this by encoding and searching for "image segments" instead of the whole images.
We show that retrieving these partial representations leads to significantly higher recognition recall than the typical whole image based retrieval.
arXiv Detail & Related papers (2024-09-26T16:49:58Z) - Breaking the Frame: Visual Place Recognition by Overlap Prediction [53.17564423756082]
We propose a novel visual place recognition approach based on overlap prediction, called VOP.
VOP proceeds co-visible image sections by obtaining patch-level embeddings using a Vision Transformer backbone.
Our approach uses a voting mechanism to assess overlap scores for potential database images.
arXiv Detail & Related papers (2024-06-23T20:00:20Z) - End-to-end learning of keypoint detection and matching for relative pose
estimation [1.8352113484137624]
We propose a new method for estimating the relative pose between two images.
We jointly learn keypoint detection, description extraction, matching and robust pose estimation.
We demonstrate our method for the task of visual localization of a query image within a database of images with known pose.
arXiv Detail & Related papers (2021-04-02T15:16:17Z) - Cross-Descriptor Visual Localization and Mapping [81.16435356103133]
Visual localization and mapping is the key technology underlying the majority of Mixed Reality and robotics systems.
We present three novel scenarios for localization and mapping which require the continuous update of feature representations.
Our data-driven approach is agnostic to the feature descriptor type, has low computational requirements, and scales linearly with the number of description algorithms.
arXiv Detail & Related papers (2020-12-02T18:19:51Z) - Benchmarking Image Retrieval for Visual Localization [41.38065116577011]
Visual localization is a core component of technologies such as autonomous driving and augmented reality.
It is common practice to use state-of-the-art image retrieval algorithms for these tasks.
This paper focuses on understanding the role of image retrieval for multiple visual localization tasks.
arXiv Detail & Related papers (2020-11-24T07:59:52Z) - Geometrically Mappable Image Features [85.81073893916414]
Vision-based localization of an agent in a map is an important problem in robotics and computer vision.
We propose a method that learns image features targeted for image-retrieval-based localization.
arXiv Detail & Related papers (2020-03-21T15:36:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.