Learning Semantics for Visual Place Recognition through Multi-Scale
Attention
- URL: http://arxiv.org/abs/2201.09701v2
- Date: Tue, 25 Jan 2022 11:12:33 GMT
- Title: Learning Semantics for Visual Place Recognition through Multi-Scale
Attention
- Authors: Valerio Paolicelli, Antonio Tavera, Carlo Masone, Gabriele Berton,
Barbara Caputo
- Abstract summary: We present the first VPR algorithm that learns robust global embeddings from both visual appearance and semantic content of the data.
Experiments on various scenarios validate this new approach and demonstrate its performance against state-of-the-art methods.
- Score: 14.738954189759156
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper we address the task of visual place recognition (VPR), where
the goal is to retrieve the correct GPS coordinates of a given query image
against a huge geotagged gallery. While recent works have shown that building
descriptors incorporating semantic and appearance information is beneficial,
current state-of-the-art methods opt for a top down definition of the
significant semantic content. Here we present the first VPR algorithm that
learns robust global embeddings from both visual appearance and semantic
content of the data, with the segmentation process being dynamically guided by
the recognition of places through a multi-scale attention module. Experiments
on various scenarios validate this new approach and demonstrate its performance
against state-of-the-art methods. Finally, we propose the first synthetic-world
dataset suited for both place recognition and segmentation tasks.
Related papers
- Exploiting Object-based and Segmentation-based Semantic Features for Deep Learning-based Indoor Scene Classification [0.5572976467442564]
The work described in this paper uses both semantic information, obtained from object detection, and semantic segmentation techniques.
A novel approach that uses a semantic segmentation mask to provide Hu-moments-based segmentation categories' shape characterization, designated by Hu-Moments Features (SHMFs) is proposed.
A three-main-branch network, designated by GOS$2$F$2$App, that exploits deep-learning-based global features, object-based features, and semantic segmentation-based features is also proposed.
arXiv Detail & Related papers (2024-04-11T13:37:51Z) - Mapping High-level Semantic Regions in Indoor Environments without
Object Recognition [50.624970503498226]
The present work proposes a method for semantic region mapping via embodied navigation in indoor environments.
To enable region identification, the method uses a vision-to-language model to provide scene information for mapping.
By projecting egocentric scene understanding into the global frame, the proposed method generates a semantic map as a distribution over possible region labels at each location.
arXiv Detail & Related papers (2024-03-11T18:09:50Z) - Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised
Semantic Segmentation [79.05949524349005]
We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from saliency maps.
We also propose a cross-task affinity learning mechanism to learn pixel-level affinities from the saliency and segmentation feature maps.
arXiv Detail & Related papers (2024-03-02T10:03:21Z) - Open-world Semantic Segmentation via Contrasting and Clustering
Vision-Language Embedding [95.78002228538841]
We propose a new open-world semantic segmentation pipeline that makes the first attempt to learn to segment semantic objects of various open-world categories without any efforts on dense annotations.
Our method can directly segment objects of arbitrary categories, outperforming zero-shot segmentation methods that require data labeling on three benchmark datasets.
arXiv Detail & Related papers (2022-07-18T09:20:04Z) - TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic
Segmentation [44.75300205362518]
Unsupervised semantic segmentation aims to obtain high-level semantic representation on low-level visual features without manual annotations.
We propose the first top-down unsupervised semantic segmentation framework for fine-grained segmentation in extremely complicated scenarios.
Our results show that our top-down unsupervised segmentation is robust to both object-centric and scene-centric datasets.
arXiv Detail & Related papers (2021-12-02T18:59:03Z) - Semantic Reinforced Attention Learning for Visual Place Recognition [15.84086970453363]
Large-scale visual place recognition (VPR) is inherently challenging because not all visual cues in the image are beneficial to the task.
We propose a novel Semantic Reinforced Attention Learning Network (SRALNet), in which the inferred attention can benefit from both semantic priors and data-driven fine-tuning.
Experiments demonstrate that our method outperforms state-of-the-art techniques on city-scale VPR benchmark datasets.
arXiv Detail & Related papers (2021-08-19T02:14:36Z) - Distribution Alignment: A Unified Framework for Long-tail Visual
Recognition [52.36728157779307]
We propose a unified distribution alignment strategy for long-tail visual recognition.
We then introduce a generalized re-weight method in the two-stage learning to balance the class prior.
Our approach achieves the state-of-the-art results across all four recognition tasks with a simple and unified framework.
arXiv Detail & Related papers (2021-03-30T14:09:53Z) - Rethinking of the Image Salient Object Detection: Object-level Semantic
Saliency Re-ranking First, Pixel-wise Saliency Refinement Latter [62.26677215668959]
We propose a lightweight, weakly supervised deep network to coarsely locate semantically salient regions.
We then fuse multiple off-the-shelf deep models on these semantically salient regions as the pixel-wise saliency refinement.
Our method is simple yet effective, which is the first attempt to consider the salient object detection mainly as an object-level semantic re-ranking problem.
arXiv Detail & Related papers (2020-08-10T07:12:43Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.