Related papers: Investigating the Role of Image Retrieval for Visual Localization -- An exhaustive benchmark

Investigating the Role of Image Retrieval for Visual Localization -- An exhaustive benchmark

URL: http://arxiv.org/abs/2205.15761v1
Date: Tue, 31 May 2022 12:59:01 GMT
Title: Investigating the Role of Image Retrieval for Visual Localization -- An exhaustive benchmark
Authors: Martin Humenberger and Yohann Cabon and No\'e Pion and Philippe Weinzaepfel and Donghwan Lee and Nicolas Gu\'erin and Torsten Sattler and Gabriela Csurka
Abstract summary: This paper focuses on understanding the role of image retrieval for multiple visual localization paradigms. We introduce a novel benchmark setup and compare state-of-the-art retrieval representations on multiple datasets. Using these tools and in-depth analysis, we show that retrieval performance on classical landmark retrieval or place recognition tasks correlates only for some but not all paradigms to localization performance.
Score: 46.166955777187816
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visual localization, i.e., camera pose estimation in a known scene, is a core component of technologies such as autonomous driving and augmented reality. State-of-the-art localization approaches often rely on image retrieval techniques for one of two purposes: (1) provide an approximate pose estimate or (2) determine which parts of the scene are potentially visible in a given query image. It is common practice to use state-of-the-art image retrieval algorithms for both of them. These algorithms are often trained for the goal of retrieving the same landmark under a large range of viewpoint changes which often differs from the requirements of visual localization. In order to investigate the consequences for visual localization, this paper focuses on understanding the role of image retrieval for multiple visual localization paradigms. First, we introduce a novel benchmark setup and compare state-of-the-art retrieval representations on multiple datasets using localization performance as metric. Second, we investigate several definitions of "ground truth" for image retrieval. Using these definitions as upper bounds for the visual localization paradigms, we show that there is still sgnificant room for improvement. Third, using these tools and in-depth analysis, we show that retrieval performance on classical landmark retrieval or place recognition tasks correlates only for some but not all paradigms to localization performance. Finally, we analyze the effects of blur and dynamic scenes in the images. We conclude that there is a need for retrieval approaches specifically designed for localization paradigms. Our benchmark and evaluation protocols are available at https://github.com/naver/kapture-localization.

Related papers

Teaching VLMs to Localize Specific Objects from In-context Examples [56.797110842152]
Vision-Language Models (VLMs) have shown remarkable capabilities across diverse visual tasks. Current VLMs lack a fundamental cognitive ability: learning to localize objects in a scene by taking into account the context. This work is the first to explore and benchmark personalized few-shot localization for VLMs.
arXiv Detail & Related papers (2024-11-20T13:34:22Z)
Revisit Anything: Visual Place Recognition via Image Segment Retrieval [8.544326445217369]
Existing visual place recognition pipelines encode the "whole" image and search for matches. We address this by encoding and searching for "image segments" instead of the whole images. We show that retrieving these partial representations leads to significantly higher recognition recall than the typical whole image based retrieval.
arXiv Detail & Related papers (2024-09-26T16:49:58Z)
Breaking the Frame: Visual Place Recognition by Overlap Prediction [53.17564423756082]
We propose a novel visual place recognition approach based on overlap prediction, called VOP. VOP proceeds co-visible image sections by obtaining patch-level embeddings using a Vision Transformer backbone. Our approach uses a voting mechanism to assess overlap scores for potential database images.
arXiv Detail & Related papers (2024-06-23T20:00:20Z)
End-to-end learning of keypoint detection and matching for relative pose estimation [1.8352113484137624]
We propose a new method for estimating the relative pose between two images. We jointly learn keypoint detection, description extraction, matching and robust pose estimation. We demonstrate our method for the task of visual localization of a query image within a database of images with known pose.
arXiv Detail & Related papers (2021-04-02T15:16:17Z)
Telling the What while Pointing the Where: Fine-grained Mouse Trace and Language Supervision for Improved Image Retrieval [60.24860627782486]
Fine-grained image retrieval often requires the ability to also express the where in the image the content they are looking for is. In this paper, we describe an image retrieval setup where the user simultaneously describes an image using both spoken natural language (the "what") and mouse traces over an empty canvas (the "where") Our model is capable of taking this spatial guidance into account, and provides more accurate retrieval results compared to text-only equivalent systems.
arXiv Detail & Related papers (2021-02-09T17:54:34Z)
Cross-Descriptor Visual Localization and Mapping [81.16435356103133]
Visual localization and mapping is the key technology underlying the majority of Mixed Reality and robotics systems. We present three novel scenarios for localization and mapping which require the continuous update of feature representations. Our data-driven approach is agnostic to the feature descriptor type, has low computational requirements, and scales linearly with the number of description algorithms.
arXiv Detail & Related papers (2020-12-02T18:19:51Z)
Benchmarking Image Retrieval for Visual Localization [41.38065116577011]
Visual localization is a core component of technologies such as autonomous driving and augmented reality. It is common practice to use state-of-the-art image retrieval algorithms for these tasks. This paper focuses on understanding the role of image retrieval for multiple visual localization tasks.
arXiv Detail & Related papers (2020-11-24T07:59:52Z)
Geometrically Mappable Image Features [85.81073893916414]
Vision-based localization of an agent in a map is an important problem in robotics and computer vision. We propose a method that learns image features targeted for image-retrieval-based localization.
arXiv Detail & Related papers (2020-03-21T15:36:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.