EigenPlaces: Training Viewpoint Robust Models for Visual Place
Recognition
- URL: http://arxiv.org/abs/2308.10832v1
- Date: Mon, 21 Aug 2023 16:27:31 GMT
- Title: EigenPlaces: Training Viewpoint Robust Models for Visual Place
Recognition
- Authors: Gabriele Berton, Gabriele Trivigno, Barbara Caputo, Carlo Masone
- Abstract summary: We propose a new method, called EigenPlaces, to train our neural network on images from different point of views.
The underlying idea is to cluster the training data so as to explicitly present the model with different views of the same points of interest.
We present experiments on the most comprehensive set of datasets in literature, finding that EigenPlaces is able to outperform previous state of the art on the majority of datasets.
- Score: 22.98403243270106
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual Place Recognition is a task that aims to predict the place of an image
(called query) based solely on its visual features. This is typically done
through image retrieval, where the query is matched to the most similar images
from a large database of geotagged photos, using learned global descriptors. A
major challenge in this task is recognizing places seen from different
viewpoints. To overcome this limitation, we propose a new method, called
EigenPlaces, to train our neural network on images from different point of
views, which embeds viewpoint robustness into the learned global descriptors.
The underlying idea is to cluster the training data so as to explicitly present
the model with different views of the same points of interest. The selection of
this points of interest is done without the need for extra supervision. We then
present experiments on the most comprehensive set of datasets in literature,
finding that EigenPlaces is able to outperform previous state of the art on the
majority of datasets, while requiring 60\% less GPU memory for training and
using 50\% smaller descriptors. The code and trained models for EigenPlaces are
available at {\small{\url{https://github.com/gmberton/EigenPlaces}}}, while
results with any other baseline can be computed with the codebase at
{\small{\url{https://github.com/gmberton/auto_VPR}}}.
Related papers
- Are Local Features All You Need for Cross-Domain Visual Place
Recognition? [13.519413608607781]
Visual Place Recognition aims to predict the coordinates of an image based solely on visual clues.
Despite recent advances, recognizing the same place when the query comes from a significantly different distribution is still a major hurdle for state of the art retrieval methods.
In this work we explore whether re-ranking methods based on spatial verification can tackle these challenges.
arXiv Detail & Related papers (2023-04-12T14:46:57Z) - Sparse Spatial Transformers for Few-Shot Learning [6.271261279657655]
Learning from limited data is challenging because data scarcity leads to a poor generalization of the trained model.
We propose a novel transformer-based neural network architecture called sparse spatial transformers.
Our method finds task-relevant features and suppresses task-irrelevant features.
arXiv Detail & Related papers (2021-09-27T10:36:32Z) - Learning to Generate Scene Graph from Natural Language Supervision [52.18175340725455]
We propose one of the first methods that learn from image-sentence pairs to extract a graphical representation of localized objects and their relationships within an image, known as scene graph.
We leverage an off-the-shelf object detector to identify and localize object instances, match labels of detected regions to concepts parsed from captions, and thus create "pseudo" labels for learning scene graph.
arXiv Detail & Related papers (2021-09-06T03:38:52Z) - Rectifying the Shortcut Learning of Background: Shared Object
Concentration for Few-Shot Image Recognition [101.59989523028264]
Few-Shot image classification aims to utilize pretrained knowledge learned from a large-scale dataset to tackle a series of downstream classification tasks.
We propose COSOC, a novel Few-Shot Learning framework, to automatically figure out foreground objects at both pretraining and evaluation stage.
arXiv Detail & Related papers (2021-07-16T07:46:41Z) - Data Augmentation for Object Detection via Differentiable Neural
Rendering [71.00447761415388]
It is challenging to train a robust object detector when annotated data is scarce.
Existing approaches to tackle this problem include semi-supervised learning that interpolates labeled data from unlabeled data.
We introduce an offline data augmentation method for object detection, which semantically interpolates the training data with novel views.
arXiv Detail & Related papers (2021-03-04T06:31:06Z) - VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval [19.239311087570318]
Cross-view image geo-localization aims to determine the locations of street-view query images by matching with GPS-tagged reference images from aerial view.
Recent works have achieved surprisingly high retrieval accuracy on city-scale datasets.
We propose a new large-scale benchmark -- VIGOR -- for cross-View Image Geo-localization beyond One-to-one Retrieval.
arXiv Detail & Related papers (2020-11-24T15:50:54Z) - Inter-Image Communication for Weakly Supervised Localization [77.2171924626778]
Weakly supervised localization aims at finding target object regions using only image-level supervision.
We propose to leverage pixel-level similarities across different objects for learning more accurate object locations.
Our method achieves the Top-1 localization error rate of 45.17% on the ILSVRC validation set.
arXiv Detail & Related papers (2020-08-12T04:14:11Z) - Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning.
Current contrastive models are ineffective at localizing the foreground object.
We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z) - Self-Supervised Viewpoint Learning From Image Collections [116.56304441362994]
We propose a novel learning framework which incorporates an analysis-by-synthesis paradigm to reconstruct images in a viewpoint aware manner.
We show that our approach performs competitively to fully-supervised approaches for several object categories like human faces, cars, buses, and trains.
arXiv Detail & Related papers (2020-04-03T22:01:41Z) - Unifying Deep Local and Global Features for Image Search [9.614694312155798]
We unify global and local image features into a single deep model, enabling accurate retrieval with efficient feature extraction.
Our model achieves state-of-the-art image retrieval on the Revisited Oxford and Paris datasets, and state-of-the-art single-model instance-level recognition on the Google Landmarks dataset v2.
arXiv Detail & Related papers (2020-01-14T19:59:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.