Close, But Not There: Boosting Geographic Distance Sensitivity in Visual Place Recognition
- URL: http://arxiv.org/abs/2407.02422v1
- Date: Tue, 2 Jul 2024 16:49:01 GMT
- Title: Close, But Not There: Boosting Geographic Distance Sensitivity in Visual Place Recognition
- Authors: Sergio Izquierdo, Javier Civera,
- Abstract summary: We formulate how limitations in the Geographic Distance Sensitivity of current VPR embeddings result in a high probability of incorrectly sorting the top-k retrievals.
We propose a novel mining strategy, CliqueMining, that selects positive and negative examples by sampling cliques from a graph of visually similar images.
Our approach boosts the sensitivity of VPR embeddings at small distance ranges, significantly improving the state of the art on relevant benchmarks.
- Score: 9.192660643226372
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual Place Recognition (VPR) plays a critical role in many localization and mapping pipelines. It consists of retrieving the closest sample to a query image, in a certain embedding space, from a database of geotagged references. The image embedding is learned to effectively describe a place despite variations in visual appearance, viewpoint, and geometric changes. In this work, we formulate how limitations in the Geographic Distance Sensitivity of current VPR embeddings result in a high probability of incorrectly sorting the top-k retrievals, negatively impacting the recall. In order to address this issue in single-stage VPR, we propose a novel mining strategy, CliqueMining, that selects positive and negative examples by sampling cliques from a graph of visually similar images. Our approach boosts the sensitivity of VPR embeddings at small distance ranges, significantly improving the state of the art on relevant benchmarks. In particular, we raise recall@1 from 75% to 82% in MSLS Challenge, and from 76% to 90% in Nordland. Models and code are available at https://github.com/serizba/cliquemining.
Related papers
- Breaking the Frame: Visual Place Recognition by Overlap Prediction [53.17564423756082]
We propose a novel visual place recognition approach based on overlap prediction, called VOP.
VOP proceeds co-visible image sections by obtaining patch-level embeddings using a Vision Transformer backbone.
Our approach uses a voting mechanism to assess overlap scores for potential database images.
arXiv Detail & Related papers (2024-06-23T20:00:20Z) - Visual place recognition for aerial imagery: A survey [3.122408196953972]
Aerial imagery and its direct application to visual localization is an essential problem for many Robotics and Computer Vision tasks.
Global Navigation Satellite Systems (GNSS) are the standard default solution for solving the aerial localization problem.
Visual geolocalization is emerging as a viable alternative.
arXiv Detail & Related papers (2024-06-02T22:40:05Z) - NYC-Indoor-VPR: A Long-Term Indoor Visual Place Recognition Dataset with Semi-Automatic Annotation [7.037667953803237]
This paper introduces the NYC-Indoor-VPR dataset, a unique and rich collection of over 36,000 images compiled from 13 distinct crowded scenes in New York City.
To establish the ground truth for VPR, we propose a semiautomatic annotation approach that computes the positional information of each image.
Our method specifically takes pairs of videos as input and yields matched pairs of images along with their estimated relative locations.
arXiv Detail & Related papers (2024-03-31T00:20:53Z) - CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition.
Our method uses the attention mechanism to correlate multiple images within a batch.
Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z) - Deep Homography Estimation for Visual Place Recognition [49.235432979736395]
We propose a transformer-based deep homography estimation (DHE) network.
It takes the dense feature map extracted by a backbone network as input and fits homography for fast and learnable geometric verification.
Experiments on benchmark datasets show that our method can outperform several state-of-the-art methods.
arXiv Detail & Related papers (2024-02-25T13:22:17Z) - CorrMatch: Label Propagation via Correlation Matching for
Semi-Supervised Semantic Segmentation [73.89509052503222]
This paper presents a simple but performant semi-supervised semantic segmentation approach, called CorrMatch.
We observe that the correlation maps not only enable clustering pixels of the same category easily but also contain good shape information.
We propose to conduct pixel propagation by modeling the pairwise similarities of pixels to spread the high-confidence pixels and dig out more.
Then, we perform region propagation to enhance the pseudo labels with accurate class-agnostic masks extracted from the correlation maps.
arXiv Detail & Related papers (2023-06-07T10:02:29Z) - CoPR: Towards Accurate Visual Localization With Continuous
Place-descriptor Regression [2.7393821783237184]
Visual Place Recognition (VPR) estimates the camera location of a query image by retrieving the most similar reference image from a map of geo-tagged reference images.
References for VPR are only available at sparse poses in a map, which enforces an upper bound on the maximum achievable localization accuracy.
We propose Continuous Place-descriptor Regression (CoPR) to densify the map and improve localization accuracy.
arXiv Detail & Related papers (2023-04-14T23:17:44Z) - Are Local Features All You Need for Cross-Domain Visual Place
Recognition? [13.519413608607781]
Visual Place Recognition aims to predict the coordinates of an image based solely on visual clues.
Despite recent advances, recognizing the same place when the query comes from a significantly different distribution is still a major hurdle for state of the art retrieval methods.
In this work we explore whether re-ranking methods based on spatial verification can tackle these challenges.
arXiv Detail & Related papers (2023-04-12T14:46:57Z) - $R^{2}$Former: Unified $R$etrieval and $R$eranking Transformer for Place
Recognition [92.56937383283397]
We propose a unified place recognition framework that handles both retrieval and reranking.
The proposed reranking module takes feature correlation, attention value, and xy coordinates into account.
$R2$Former significantly outperforms state-of-the-art methods on major VPR datasets.
arXiv Detail & Related papers (2023-04-06T23:19:32Z) - BEVPlace: Learning LiDAR-based Place Recognition using Bird's Eye View
Images [20.30997801125592]
We explore the potential of a different representation in place recognition, i.e. bird's eye view (BEV) images.
A simple VGGNet trained on BEV images achieves comparable performance with the state-of-the-art place recognition methods in scenes of slight viewpoint changes.
We develop a method to estimate the position of the query cloud, extending the usage of place recognition.
arXiv Detail & Related papers (2023-02-28T05:37:45Z) - With a Little Help from My Friends: Nearest-Neighbor Contrastive
Learning of Visual Representations [87.72779294717267]
Using the nearest-neighbor as positive in contrastive losses improves performance significantly on ImageNet classification.
We demonstrate empirically that our method is less reliant on complex data augmentations.
arXiv Detail & Related papers (2021-04-29T17:56:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.