Regressing Transformers for Data-efficient Visual Place Recognition
- URL: http://arxiv.org/abs/2401.16304v1
- Date: Mon, 29 Jan 2024 17:04:32 GMT
- Title: Regressing Transformers for Data-efficient Visual Place Recognition
- Authors: Mar\'ia Leyva-Vallina, Nicola Strisciuglio and Nicolai Petkov
- Abstract summary: This work introduces a fresh perspective by framing place recognition as a regression problem.
By optimizing image descriptors to align directly with graded similarity labels, this approach enhances ranking capabilities without expensive re-ranking.
- Score: 10.156432076272475
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual place recognition is a critical task in computer vision, especially
for localization and navigation systems. Existing methods often rely on
contrastive learning: image descriptors are trained to have small distance for
similar images and larger distance for dissimilar ones in a latent space.
However, this approach struggles to ensure accurate distance-based image
similarity representation, particularly when training with binary pairwise
labels, and complex re-ranking strategies are required. This work introduces a
fresh perspective by framing place recognition as a regression problem, using
camera field-of-view overlap as similarity ground truth for learning. By
optimizing image descriptors to align directly with graded similarity labels,
this approach enhances ranking capabilities without expensive re-ranking,
offering data-efficient training and strong generalization across several
benchmark datasets.
Related papers
- Breaking the Frame: Visual Place Recognition by Overlap Prediction [53.17564423756082]
We propose a novel visual place recognition approach based on overlap prediction, called VOP.
VOP proceeds co-visible image sections by obtaining patch-level embeddings using a Vision Transformer backbone.
Our approach uses a voting mechanism to assess overlap scores for potential database images.
arXiv Detail & Related papers (2024-06-23T20:00:20Z) - CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition.
Our method uses the attention mechanism to correlate multiple images within a batch.
Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z) - Patch-Wise Self-Supervised Visual Representation Learning: A Fine-Grained Approach [4.9204263448542465]
This study introduces an innovative, fine-grained dimension by integrating patch-level discrimination into self-supervised visual representation learning.
We employ a distinctive photometric patch-level augmentation, where each patch is individually augmented, independent from other patches within the same view.
We present a simple yet effective patch-matching algorithm to find the corresponding patches across the augmented views.
arXiv Detail & Related papers (2023-10-28T09:35:30Z) - CSP: Self-Supervised Contrastive Spatial Pre-Training for
Geospatial-Visual Representations [90.50864830038202]
We present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images.
We use a dual-encoder to separately encode the images and their corresponding geo-locations, and use contrastive objectives to learn effective location representations from images.
CSP significantly boosts the model performance with 10-34% relative improvement with various labeled training data sampling ratios.
arXiv Detail & Related papers (2023-05-01T23:11:18Z) - Data-efficient Large Scale Place Recognition with Graded Similarity
Supervision [10.117451511942267]
Visual place recognition (VPR) is a fundamental task of computer vision for visual localization.
Existing methods are trained using image pairs that either depict the same place or not.
We deploy an automatic re-annotation strategy to re-label VPR datasets.
We propose a new Generalized Contrastive Loss (GCL) that uses graded similarity labels for training contrastive networks.
arXiv Detail & Related papers (2023-03-21T10:56:57Z) - Multi-modal unsupervised brain image registration using edge maps [7.49320945341034]
We propose a simple yet effective unsupervised deep learning-based em multi-modal image registration approach.
The intuition behind this is that image locations with a strong gradient are assumed to denote a transition of tissues.
We evaluate our approach in the context of registering multi-modal (T1w to T2w) magnetic resonance (MR) brain images of different subjects using three different loss functions.
arXiv Detail & Related papers (2022-02-09T15:50:14Z) - Region-level Active Learning for Cluttered Scenes [60.93811392293329]
We introduce a new strategy that subsumes previous Image-level and Object-level approaches into a generalized, Region-level approach.
We show that this approach significantly decreases labeling effort and improves rare object search on realistic data with inherent class-imbalance and cluttered scenes.
arXiv Detail & Related papers (2021-08-20T14:02:38Z) - Semantic similarity metrics for learned image registration [10.355938901584565]
We propose a semantic similarity metric for image registration.
Our approach learns dataset-specific features that drive the optimization of a learning-based registration model.
We train both an unsupervised approach using an auto-encoder, and a semi-supervised approach using supplemental segmentation data to extract semantic features for image registration.
arXiv Detail & Related papers (2021-04-20T15:23:58Z) - Unsupervised Metric Relocalization Using Transform Consistency Loss [66.19479868638925]
Training networks to perform metric relocalization traditionally requires accurate image correspondences.
We propose a self-supervised solution, which exploits a key insight: localizing a query image within a map should yield the same absolute pose, regardless of the reference image used for registration.
We evaluate our framework on synthetic and real-world data, showing our approach outperforms other supervised methods when a limited amount of ground-truth information is available.
arXiv Detail & Related papers (2020-11-01T19:24:27Z) - Geometrically Mappable Image Features [85.81073893916414]
Vision-based localization of an agent in a map is an important problem in robotics and computer vision.
We propose a method that learns image features targeted for image-retrieval-based localization.
arXiv Detail & Related papers (2020-03-21T15:36:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.