$R^{2}$Former: Unified $R$etrieval and $R$eranking Transformer for Place
Recognition
- URL: http://arxiv.org/abs/2304.03410v1
- Date: Thu, 6 Apr 2023 23:19:32 GMT
- Title: $R^{2}$Former: Unified $R$etrieval and $R$eranking Transformer for Place
Recognition
- Authors: Sijie Zhu, Linjie Yang, Chen Chen, Mubarak Shah, Xiaohui Shen, Heng
Wang
- Abstract summary: We propose a unified place recognition framework that handles both retrieval and reranking.
The proposed reranking module takes feature correlation, attention value, and xy coordinates into account.
$R2$Former significantly outperforms state-of-the-art methods on major VPR datasets.
- Score: 92.56937383283397
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual Place Recognition (VPR) estimates the location of query images by
matching them with images in a reference database. Conventional methods
generally adopt aggregated CNN features for global retrieval and RANSAC-based
geometric verification for reranking. However, RANSAC only employs geometric
information but ignores other possible information that could be useful for
reranking, e.g. local feature correlations, and attention values. In this
paper, we propose a unified place recognition framework that handles both
retrieval and reranking with a novel transformer model, named $R^{2}$Former.
The proposed reranking module takes feature correlation, attention value, and
xy coordinates into account, and learns to determine whether the image pair is
from the same location. The whole pipeline is end-to-end trainable and the
reranking module alone can also be adopted on other CNN or transformer
backbones as a generic component. Remarkably, $R^{2}$Former significantly
outperforms state-of-the-art methods on major VPR datasets with much less
inference time and memory consumption. It also achieves the state-of-the-art on
the hold-out MSLS challenge set and could serve as a simple yet strong solution
for real-world large-scale applications. Experiments also show vision
transformer tokens are comparable and sometimes better than CNN local features
on local matching. The code is released at
https://github.com/Jeff-Zilence/R2Former.
Related papers
- Deep Homography Estimation for Visual Place Recognition [49.235432979736395]
We propose a transformer-based deep homography estimation (DHE) network.
It takes the dense feature map extracted by a backbone network as input and fits homography for fast and learnable geometric verification.
Experiments on benchmark datasets show that our method can outperform several state-of-the-art methods.
arXiv Detail & Related papers (2024-02-25T13:22:17Z) - Optimal Transport Aggregation for Visual Place Recognition [9.192660643226372]
We introduce SALAD, which reformulates NetVLAD's soft-assignment of local features to clusters as an optimal transport problem.
In SALAD, we consider both feature-to-cluster and cluster-to-feature relations and we also introduce a 'dustbin' cluster, designed to selectively discard features deemed non-informative.
Our single-stage method surpasses single-stage baselines in public VPR datasets, but also surpasses two-stage methods that add a re-ranking with significantly higher cost.
arXiv Detail & Related papers (2023-11-27T15:46:19Z) - AANet: Aggregation and Alignment Network with Semi-hard Positive Sample
Mining for Hierarchical Place Recognition [48.043749855085025]
Visual place recognition (VPR) is one of the research hotspots in robotics, which uses visual information to locate robots.
We present a unified network capable of extracting global features for retrieving candidates via an aggregation module.
We also propose a Semi-hard Positive Sample Mining (ShPSM) strategy to select appropriate hard positive images for training more robust VPR networks.
arXiv Detail & Related papers (2023-10-08T14:46:11Z) - Are Local Features All You Need for Cross-Domain Visual Place
Recognition? [13.519413608607781]
Visual Place Recognition aims to predict the coordinates of an image based solely on visual clues.
Despite recent advances, recognizing the same place when the query comes from a significantly different distribution is still a major hurdle for state of the art retrieval methods.
In this work we explore whether re-ranking methods based on spatial verification can tackle these challenges.
arXiv Detail & Related papers (2023-04-12T14:46:57Z) - TransGeo: Transformer Is All You Need for Cross-view Image
Geo-localization [81.70547404891099]
CNN-based methods for cross-view image geo-localization fail to model global correlation.
We propose a pure transformer-based approach (TransGeo) to address these limitations.
TransGeo achieves state-of-the-art results on both urban and rural datasets.
arXiv Detail & Related papers (2022-03-31T21:19:41Z) - HAT: Hierarchical Aggregation Transformers for Person Re-identification [87.02828084991062]
We take advantages of both CNNs and Transformers for image-based person Re-ID with high performance.
Work is the first to take advantages of both CNNs and Transformers for image-based person Re-ID.
arXiv Detail & Related papers (2021-07-13T09:34:54Z) - Region Similarity Representation Learning [94.88055458257081]
Region Similarity Representation Learning (ReSim) is a new approach to self-supervised representation learning for localization-based tasks.
ReSim learns both regional representations for localization as well as semantic image-level representations.
We show how ReSim learns representations which significantly improve the localization and classification performance compared to a competitive MoCo-v2 baseline.
arXiv Detail & Related papers (2021-03-24T00:42:37Z) - Instance-level Image Retrieval using Reranking Transformers [18.304597755595697]
Instance-level image retrieval is the task of searching in a large database for images that match an object in a query image.
We propose Reranking Transformers (RRTs) as a general model to incorporate both local and global features to rerank the matching images.
RRTs are lightweight and can be easily parallelized so that reranking a set of top matching results can be performed in a single forward-pass.
arXiv Detail & Related papers (2021-03-22T23:58:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.