Data-efficient Large Scale Place Recognition with Graded Similarity
Supervision
- URL: http://arxiv.org/abs/2303.11739v2
- Date: Sat, 25 Mar 2023 18:54:41 GMT
- Title: Data-efficient Large Scale Place Recognition with Graded Similarity
Supervision
- Authors: Maria Leyva-Vallina, Nicola Strisciuglio, Nicolai Petkov
- Abstract summary: Visual place recognition (VPR) is a fundamental task of computer vision for visual localization.
Existing methods are trained using image pairs that either depict the same place or not.
We deploy an automatic re-annotation strategy to re-label VPR datasets.
We propose a new Generalized Contrastive Loss (GCL) that uses graded similarity labels for training contrastive networks.
- Score: 10.117451511942267
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual place recognition (VPR) is a fundamental task of computer vision for
visual localization. Existing methods are trained using image pairs that either
depict the same place or not. Such a binary indication does not consider
continuous relations of similarity between images of the same place taken from
different positions, determined by the continuous nature of camera pose. The
binary similarity induces a noisy supervision signal into the training of VPR
methods, which stall in local minima and require expensive hard mining
algorithms to guarantee convergence. Motivated by the fact that two images of
the same place only partially share visual cues due to camera pose differences,
we deploy an automatic re-annotation strategy to re-label VPR datasets. We
compute graded similarity labels for image pairs based on available
localization metadata. Furthermore, we propose a new Generalized Contrastive
Loss (GCL) that uses graded similarity labels for training contrastive
networks. We demonstrate that the use of the new labels and GCL allow to
dispense from hard-pair mining, and to train image descriptors that perform
better in VPR by nearest neighbor search, obtaining superior or comparable
results than methods that require expensive hard-pair mining and re-ranking
techniques. Code and models available at:
https://github.com/marialeyvallina/generalized_contrastive_loss
Related papers
- Breaking the Frame: Visual Place Recognition by Overlap Prediction [53.17564423756082]
We propose a novel visual place recognition approach based on overlap prediction, called VOP.
VOP proceeds co-visible image sections by obtaining patch-level embeddings using a Vision Transformer backbone.
Our approach uses a voting mechanism to assess overlap scores for potential database images.
arXiv Detail & Related papers (2024-06-23T20:00:20Z) - CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition.
Our method uses the attention mechanism to correlate multiple images within a batch.
Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z) - Deep Homography Estimation for Visual Place Recognition [49.235432979736395]
We propose a transformer-based deep homography estimation (DHE) network.
It takes the dense feature map extracted by a backbone network as input and fits homography for fast and learnable geometric verification.
Experiments on benchmark datasets show that our method can outperform several state-of-the-art methods.
arXiv Detail & Related papers (2024-02-25T13:22:17Z) - Regressing Transformers for Data-efficient Visual Place Recognition [10.156432076272475]
This work introduces a fresh perspective by framing place recognition as a regression problem.
By optimizing image descriptors to align directly with graded similarity labels, this approach enhances ranking capabilities without expensive re-ranking.
arXiv Detail & Related papers (2024-01-29T17:04:32Z) - Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge.
We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem.
Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z) - CorrMatch: Label Propagation via Correlation Matching for
Semi-Supervised Semantic Segmentation [73.89509052503222]
This paper presents a simple but performant semi-supervised semantic segmentation approach, called CorrMatch.
We observe that the correlation maps not only enable clustering pixels of the same category easily but also contain good shape information.
We propose to conduct pixel propagation by modeling the pairwise similarities of pixels to spread the high-confidence pixels and dig out more.
Then, we perform region propagation to enhance the pseudo labels with accurate class-agnostic masks extracted from the correlation maps.
arXiv Detail & Related papers (2023-06-07T10:02:29Z) - Global-Local Self-Distillation for Visual Representation Learning [41.24728444810133]
Richer and more meaningful gradients updates are key to allow self-supervised methods to learn better and in a more efficient manner.
In a typical self-distillation framework, the representation of two augmented images are enforced to be coherent at the global level.
We propose to leverage the spatial information in the input images to obtain geometric matchings.
arXiv Detail & Related papers (2022-07-29T13:50:09Z) - SimPLE: Similar Pseudo Label Exploitation for Semi-Supervised
Classification [24.386165255835063]
A common classification task situation is where one has a large amount of data available for training, but only a small portion is with class labels.
The goal of semi-supervised training, in this context, is to improve classification accuracy by leverage information from a large amount of unlabeled data.
We propose a novel unsupervised objective that focuses on the less studied relationship between the high confidence unlabeled data that are similar to each other.
Our proposed SimPLE algorithm shows significant performance gains over previous algorithms on CIFAR-100 and Mini-ImageNet, and is on par with the state-of-the-art methods
arXiv Detail & Related papers (2021-03-30T23:48:06Z) - G-SimCLR : Self-Supervised Contrastive Learning with Guided Projection
via Pseudo Labelling [0.8164433158925593]
In computer vision, it is evident that deep neural networks perform better in a supervised setting with a large amount of labeled data.
In this work, we propose that, with the normalized temperature-scaled cross-entropy (NT-Xent) loss function, it is beneficial to not have images of the same category in the same batch.
We use the latent space representation of a denoising autoencoder trained on the unlabeled dataset and cluster them with k-means to obtain pseudo labels.
arXiv Detail & Related papers (2020-09-25T02:25:37Z) - High-Order Information Matters: Learning Relation and Topology for
Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment.
Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.