Are Local Features All You Need for Cross-Domain Visual Place
Recognition?
- URL: http://arxiv.org/abs/2304.05887v1
- Date: Wed, 12 Apr 2023 14:46:57 GMT
- Title: Are Local Features All You Need for Cross-Domain Visual Place
Recognition?
- Authors: Giovanni Barbarani, Mohamad Mostafa, Hajali Bayramov, Gabriele
Trivigno, Gabriele Berton, Carlo Masone, Barbara Caputo
- Abstract summary: Visual Place Recognition aims to predict the coordinates of an image based solely on visual clues.
Despite recent advances, recognizing the same place when the query comes from a significantly different distribution is still a major hurdle for state of the art retrieval methods.
In this work we explore whether re-ranking methods based on spatial verification can tackle these challenges.
- Score: 13.519413608607781
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual Place Recognition is a task that aims to predict the coordinates of an
image (called query) based solely on visual clues. Most commonly, a retrieval
approach is adopted, where the query is matched to the most similar images from
a large database of geotagged photos, using learned global descriptors. Despite
recent advances, recognizing the same place when the query comes from a
significantly different distribution is still a major hurdle for state of the
art retrieval methods. Examples are heavy illumination changes (e.g. night-time
images) or substantial occlusions (e.g. transient objects). In this work we
explore whether re-ranking methods based on spatial verification can tackle
these challenges, following the intuition that local descriptors are inherently
more robust than global features to domain shifts. To this end, we provide a
new, comprehensive benchmark on current state of the art models. We also
introduce two new demanding datasets with night and occluded queries, to be
matched against a city-wide database. Code and datasets are available at
https://github.com/gbarbarani/re-ranking-for-VPR.
Related papers
- Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework [51.26566634946208]
We introduce smileGeo, a novel visual geo-localization framework.
By inter-agent communication, smileGeo integrates the inherent knowledge of these agents with additional retrieved information.
Results show that our approach significantly outperforms current state-of-the-art methods.
arXiv Detail & Related papers (2024-08-21T03:31:30Z) - Deep Homography Estimation for Visual Place Recognition [49.235432979736395]
We propose a transformer-based deep homography estimation (DHE) network.
It takes the dense feature map extracted by a backbone network as input and fits homography for fast and learnable geometric verification.
Experiments on benchmark datasets show that our method can outperform several state-of-the-art methods.
arXiv Detail & Related papers (2024-02-25T13:22:17Z) - GeoCLIP: Clip-Inspired Alignment between Locations and Images for
Effective Worldwide Geo-localization [61.10806364001535]
Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth.
Existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task.
We propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations.
arXiv Detail & Related papers (2023-09-27T20:54:56Z) - EigenPlaces: Training Viewpoint Robust Models for Visual Place
Recognition [22.98403243270106]
We propose a new method, called EigenPlaces, to train our neural network on images from different point of views.
The underlying idea is to cluster the training data so as to explicitly present the model with different views of the same points of interest.
We present experiments on the most comprehensive set of datasets in literature, finding that EigenPlaces is able to outperform previous state of the art on the majority of datasets.
arXiv Detail & Related papers (2023-08-21T16:27:31Z) - Yes, we CANN: Constrained Approximate Nearest Neighbors for local
feature-based visual localization [2.915868985330569]
Constrained Approximate Nearest Neighbors (CANN) is a joint solution of k-nearest-neighbors across both the geometry and appearance space using only local features.
Our method significantly outperforms both state-of-the-art global feature-based retrieval and approaches using local feature aggregation schemes.
arXiv Detail & Related papers (2023-06-15T10:12:10Z) - $R^{2}$Former: Unified $R$etrieval and $R$eranking Transformer for Place
Recognition [92.56937383283397]
We propose a unified place recognition framework that handles both retrieval and reranking.
The proposed reranking module takes feature correlation, attention value, and xy coordinates into account.
$R2$Former significantly outperforms state-of-the-art methods on major VPR datasets.
arXiv Detail & Related papers (2023-04-06T23:19:32Z) - Where We Are and What We're Looking At: Query Based Worldwide Image
Geo-localization Using Hierarchies and Scenes [53.53712888703834]
We introduce an end-to-end transformer-based architecture that exploits the relationship between different geographic levels.
We achieve state of the art street level accuracy on 4 standard geo-localization datasets.
arXiv Detail & Related papers (2023-03-07T21:47:58Z) - Cross-Descriptor Visual Localization and Mapping [81.16435356103133]
Visual localization and mapping is the key technology underlying the majority of Mixed Reality and robotics systems.
We present three novel scenarios for localization and mapping which require the continuous update of feature representations.
Our data-driven approach is agnostic to the feature descriptor type, has low computational requirements, and scales linearly with the number of description algorithms.
arXiv Detail & Related papers (2020-12-02T18:19:51Z) - VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval [19.239311087570318]
Cross-view image geo-localization aims to determine the locations of street-view query images by matching with GPS-tagged reference images from aerial view.
Recent works have achieved surprisingly high retrieval accuracy on city-scale datasets.
We propose a new large-scale benchmark -- VIGOR -- for cross-View Image Geo-localization beyond One-to-one Retrieval.
arXiv Detail & Related papers (2020-11-24T15:50:54Z) - Benchmarking Image Retrieval for Visual Localization [41.38065116577011]
Visual localization is a core component of technologies such as autonomous driving and augmented reality.
It is common practice to use state-of-the-art image retrieval algorithms for these tasks.
This paper focuses on understanding the role of image retrieval for multiple visual localization tasks.
arXiv Detail & Related papers (2020-11-24T07:59:52Z) - City-Scale Visual Place Recognition with Deep Local Features Based on
Multi-Scale Ordered VLAD Pooling [5.274399407597545]
We present a fully-automated system for place recognition at a city-scale based on content-based image retrieval.
Firstly, we take a comprehensive analysis of visual place recognition and sketch out the unique challenges of the task.
Next, we propose yet a simple pooling approach on top of convolutional neural network activations to embed the spatial information into the image representation vector.
arXiv Detail & Related papers (2020-09-19T15:21:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.