Related papers: To Match or Not to Match: Revisiting Image Matching for Reliable Visual Place Recognition

To Match or Not to Match: Revisiting Image Matching for Reliable Visual Place Recognition

URL: http://arxiv.org/abs/2504.06116v2
Date: Tue, 22 Apr 2025 07:44:36 GMT
Title: To Match or Not to Match: Revisiting Image Matching for Reliable Visual Place Recognition
Authors: Davide Sferrazza, Gabriele Berton, Gabriele Trivigno, Carlo Masone,
Abstract summary: We show that modern retrieval systems often reach a point where re-ranking can degrade results, as current VPR datasets are largely saturated.<n>We propose using image matching as a verification step to assess retrieval confidence, demonstrating that inlier counts can reliably predict when re-ranking is beneficial.
Score: 4.008780119020479
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visual Place Recognition (VPR) is a critical task in computer vision, traditionally enhanced by re-ranking retrieval results with image matching. However, recent advancements in VPR methods have significantly improved performance, challenging the necessity of re-ranking. In this work, we show that modern retrieval systems often reach a point where re-ranking can degrade results, as current VPR datasets are largely saturated. We propose using image matching as a verification step to assess retrieval confidence, demonstrating that inlier counts can reliably predict when re-ranking is beneficial. Our findings shift the paradigm of retrieval pipelines, offering insights for more robust and adaptive VPR systems. The code is available at https://github.com/FarInHeight/To-Match-or-Not-to-Match.

Related papers

RAVID: Retrieval-Augmented Visual Detection: A Knowledge-Driven Approach for AI-Generated Image Identification [14.448350657613368]
RAVID is the first framework for AI-generated image detection that leverages visual retrieval-augmented generation (RAG)<n>Our approach utilizes a fine-tuned CLIP image encoder, RAVID CLIP, enhanced with category-related prompts to improve representation learning.<n> RAVID achieves an average accuracy of 80.27% under degradation conditions, compared to 63.44% for the state-of-the-art model C2P-CLIP.
arXiv Detail & Related papers (2025-08-05T23:10:56Z)
Improving Visual Place Recognition with Sequence-Matching Receptiveness Prediction [19.577433371468533]
We present a new supervised learning approach that learns to predict the per-frame sequence matching receptiveness (SMR) of VPR techniques.<n>Our approach significantly improves VPR performance across a large range of state-of-the-art and classical VPR techniques.
arXiv Detail & Related papers (2025-03-10T02:01:24Z)
SelaVPR++: Towards Seamless Adaptation of Foundation Models for Efficient Place Recognition [69.58329995485158]
Recent studies show that the visual place recognition (VPR) method using pre-trained visual foundation models can achieve promising performance.<n>We propose a novel method to realize seamless adaptation of foundation models to VPR.<n>In pursuit of higher efficiency and better performance, we propose an extension of the SelaVPR, called SelaVPR++.
arXiv Detail & Related papers (2025-02-23T15:01:09Z)
Pair-VPR: Place-Aware Pre-training and Contrastive Pair Classification for Visual Place Recognition with Vision Transformers [6.890658812702241]
We propose a novel joint training method for Visual Place Recognition (VPR) The pair classifier can predict whether a given pair of images are from the same place or not. By re-using the Mask Image Modelling encoder and decoder weights in the second stage of training, Pair-VPR can achieve state-of-the-art VPR performance.
arXiv Detail & Related papers (2024-10-09T07:09:46Z)
Breaking the Frame: Visual Place Recognition by Overlap Prediction [53.17564423756082]
We propose a novel visual place recognition approach based on overlap prediction, called VOP.<n>VOP proceeds co-visible image sections by obtaining patch-level embeddings using a Vision Transformer backbone.<n>Our approach uses a voting mechanism to assess overlap scores for potential database images.
arXiv Detail & Related papers (2024-06-23T20:00:20Z)
CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition. Our method uses the attention mechanism to correlate multiple images within a batch. Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z)
Deep Homography Estimation for Visual Place Recognition [49.235432979736395]
We propose a transformer-based deep homography estimation (DHE) network. It takes the dense feature map extracted by a backbone network as input and fits homography for fast and learnable geometric verification. Experiments on benchmark datasets show that our method can outperform several state-of-the-art methods.
arXiv Detail & Related papers (2024-02-25T13:22:17Z)
Distillation Improves Visual Place Recognition for Low Quality Images [13.440872071847627]
Real-time visual localization often utilizes online computing, for which query images or videos are transmitted to remote servers for visual place recognition (VPR)<n>limited network bandwidth necessitates image-quality reduction and thus the degradation of global image descriptors, reducing VPR accuracy.<n>We address this issue at the descriptor extraction level with a knowledge-distillation methodology that learns feature representations from high-quality images to extract more discriminative descriptors from low-quality images.
arXiv Detail & Related papers (2023-10-10T18:03:29Z)
Graph Convolution Based Efficient Re-Ranking for Visual Retrieval [29.804582207550478]
We present an efficient re-ranking method which refines initial retrieval results by updating features. Specifically, we reformulate re-ranking based on Graph Convolution Networks (GCN) and propose a novel Graph Convolution based Re-ranking (GCR) for visual retrieval tasks via feature propagation. In particular, the plain GCR is extended for cross-camera retrieval and an improved feature propagation formulation is presented to leverage affinity relationships across different cameras.
arXiv Detail & Related papers (2023-06-15T00:28:08Z)
$R^{2}$Former: Unified $R$etrieval and $R$eranking Transformer for Place Recognition [92.56937383283397]
We propose a unified place recognition framework that handles both retrieval and reranking. The proposed reranking module takes feature correlation, attention value, and xy coordinates into account. $R2$Former significantly outperforms state-of-the-art methods on major VPR datasets.
arXiv Detail & Related papers (2023-04-06T23:19:32Z)
Contextual Similarity Aggregation with Self-attention for Visual Re-ranking [96.55393026011811]
We propose a visual re-ranking method by contextual similarity aggregation with self-attention. We conduct comprehensive experiments on four benchmark datasets to demonstrate the generality and effectiveness of our proposed visual re-ranking method.
arXiv Detail & Related papers (2021-10-26T06:20:31Z)
Graph Sampling Based Deep Metric Learning for Generalizable Person Re-Identification [114.56752624945142]
We argue that the most popular random sampling method, the well-known PK sampler, is not informative and efficient for deep metric learning. We propose an efficient mini batch sampling method called Graph Sampling (GS) for large-scale metric learning.
arXiv Detail & Related papers (2021-04-04T06:44:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.