A Transformer-Based Adaptive Semantic Aggregation Method for UAV Visual
Geo-Localization
- URL: http://arxiv.org/abs/2401.01574v1
- Date: Wed, 3 Jan 2024 06:58:52 GMT
- Title: A Transformer-Based Adaptive Semantic Aggregation Method for UAV Visual
Geo-Localization
- Authors: Shishen Li, Cuiwei Liu, Huaijun Qiu and Zhaokui Li
- Abstract summary: This paper addresses the task of Unmanned Aerial Vehicles (UAV) visual geo-localization.
Part matching is crucial for UAV visual geo-localization since part-level representations can capture image details and help to understand the semantic information of scenes.
We introduce a transformer-based adaptive semantic aggregation method that regards parts as the most representative semantics in an image.
- Score: 2.1462492411694756
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses the task of Unmanned Aerial Vehicles (UAV) visual
geo-localization, which aims to match images of the same geographic target
taken by different platforms, i.e., UAVs and satellites. In general, the key to
achieving accurate UAV-satellite image matching lies in extracting visual
features that are robust against viewpoint changes, scale variations, and
rotations. Current works have shown that part matching is crucial for UAV
visual geo-localization since part-level representations can capture image
details and help to understand the semantic information of scenes. However, the
importance of preserving semantic characteristics in part-level representations
is not well discussed. In this paper, we introduce a transformer-based adaptive
semantic aggregation method that regards parts as the most representative
semantics in an image. Correlations of image patches to different parts are
learned in terms of the transformer's feature map. Then our method decomposes
part-level features into an adaptive sum of all patch features. By doing this,
the learned parts are encouraged to focus on patches with typical semantics.
Extensive experiments on the University-1652 dataset have shown the superiority
of our method over the current works.
Related papers
- SDPL: Shifting-Dense Partition Learning for UAV-View Geo-Localization [27.131867916908156]
Cross-view geo-localization aims to match images of the same target from different platforms.
We introduce part-based representation learning, shifting-dense partition learning.
We show that SDPL is robust to position shifting, and performs com-petitively on two prevailing benchmarks.
arXiv Detail & Related papers (2024-03-07T03:07:54Z) - Progressive Semantic-Visual Mutual Adaption for Generalized Zero-Shot
Learning [74.48337375174297]
Generalized Zero-Shot Learning (GZSL) identifies unseen categories by knowledge transferred from the seen domain.
We deploy the dual semantic-visual transformer module (DSVTM) to progressively model the correspondences between prototypes and visual features.
DSVTM devises an instance-motivated semantic encoder that learns instance-centric prototypes to adapt to different images, enabling the recast of the unmatched semantic-visual pair into the matched one.
arXiv Detail & Related papers (2023-03-27T15:21:43Z) - Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts.
We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query.
Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z) - BEV-Locator: An End-to-end Visual Semantic Localization Network Using
Multi-View Images [13.258689143949912]
We propose an end-to-end visual semantic localization neural network using multi-view camera images.
The BEV-Locator is capable to estimate the vehicle poses under versatile scenarios.
Experiments report satisfactory accuracy with mean absolute errors of 0.052m, 0.135m and 0.251$circ$ in lateral, longitudinal translation and heading angle degree.
arXiv Detail & Related papers (2022-11-27T20:24:56Z) - Vision Transformers: From Semantic Segmentation to Dense Prediction [139.15562023284187]
We explore the global context learning potentials of vision transformers (ViTs) for dense visual prediction.
Our motivation is that through learning global context at full receptive field layer by layer, ViTs may capture stronger long-range dependency information.
We formulate a family of Hierarchical Local-Global (HLG) Transformers, characterized by local attention within windows and global-attention across windows in a pyramidal architecture.
arXiv Detail & Related papers (2022-07-19T15:49:35Z) - Situational Perception Guided Image Matting [16.1897179939677]
We propose a Situational Perception Guided Image Matting (SPG-IM) method that mitigates subjective bias of matting annotations.
SPG-IM can better associate inter-objects and object-to-environment saliency, and compensate the subjective nature of image matting.
arXiv Detail & Related papers (2022-04-20T07:35:51Z) - A Transformer-Based Feature Segmentation and Region Alignment Method For
UAV-View Geo-Localization [0.5257115841810257]
Cross-view geo-localization is a task of matching the same geographic image from different views.
Existing methods are mainly aimed at digging for more comprehensive fine-grained information.
We introduce a simple and efficient transformer-based structure called Feature and Region Alignment (FSRA) to enhance the model's ability to understand contextual information.
arXiv Detail & Related papers (2022-01-23T08:01:42Z) - Region Similarity Representation Learning [94.88055458257081]
Region Similarity Representation Learning (ReSim) is a new approach to self-supervised representation learning for localization-based tasks.
ReSim learns both regional representations for localization as well as semantic image-level representations.
We show how ReSim learns representations which significantly improve the localization and classification performance compared to a competitive MoCo-v2 baseline.
arXiv Detail & Related papers (2021-03-24T00:42:37Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z) - Improving Few-shot Learning by Spatially-aware Matching and
CrossTransformer [116.46533207849619]
We study the impact of scale and location mismatch in the few-shot learning scenario.
We propose a novel Spatially-aware Matching scheme to effectively perform matching across multiple scales and locations.
arXiv Detail & Related papers (2020-01-06T14:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.