A Transformer-Based Feature Segmentation and Region Alignment Method For
UAV-View Geo-Localization
- URL: http://arxiv.org/abs/2201.09206v1
- Date: Sun, 23 Jan 2022 08:01:42 GMT
- Title: A Transformer-Based Feature Segmentation and Region Alignment Method For
UAV-View Geo-Localization
- Authors: Ming Dai and Jianhong Hu and Jiedong Zhuang and Enhui Zheng
- Abstract summary: Cross-view geo-localization is a task of matching the same geographic image from different views.
Existing methods are mainly aimed at digging for more comprehensive fine-grained information.
We introduce a simple and efficient transformer-based structure called Feature and Region Alignment (FSRA) to enhance the model's ability to understand contextual information.
- Score: 0.5257115841810257
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cross-view geo-localization is a task of matching the same geographic image
from different views, e.g., unmanned aerial vehicle (UAV) and satellite. The
most difficult challenges are the position shift and the uncertainty of
distance and scale. Existing methods are mainly aimed at digging for more
comprehensive fine-grained information. However, it underestimates the
importance of extracting robust feature representation and the impact of
feature alignment. The CNN-based methods have achieved great success in
cross-view geo-localization. However it still has some limitations, e.g., it
can only extract part of the information in the neighborhood and some scale
reduction operations will make some fine-grained information lost. In
particular, we introduce a simple and efficient transformer-based structure
called Feature Segmentation and Region Alignment (FSRA) to enhance the model's
ability to understand contextual information as well as to understand the
distribution of instances. Without using additional supervisory information,
FSRA divides regions based on the heat distribution of the transformer's
feature map, and then aligns multiple specific regions in different views one
on one. Finally, FSRA integrates each region into a set of feature
representations. The difference is that FSRA does not divide regions manually,
but automatically based on the heat distribution of the feature map. So that
specific instances can still be divided and aligned when there are significant
shifts and scale changes in the image. In addition, a multiple sampling
strategy is proposed to overcome the disparity in the number of satellite
images and that of images from other sources. Experiments show that the
proposed method has superior performance and achieves the state-of-the-art in
both tasks of drone view target localization and drone navigation. Code will be
released at https://github.com/Dmmm1997/FSRA
Related papers
- SDPL: Shifting-Dense Partition Learning for UAV-View Geo-Localization [27.131867916908156]
Cross-view geo-localization aims to match images of the same target from different platforms.
We introduce part-based representation learning, shifting-dense partition learning.
We show that SDPL is robust to position shifting, and performs com-petitively on two prevailing benchmarks.
arXiv Detail & Related papers (2024-03-07T03:07:54Z) - Deep Homography Estimation for Visual Place Recognition [49.235432979736395]
We propose a transformer-based deep homography estimation (DHE) network.
It takes the dense feature map extracted by a backbone network as input and fits homography for fast and learnable geometric verification.
Experiments on benchmark datasets show that our method can outperform several state-of-the-art methods.
arXiv Detail & Related papers (2024-02-25T13:22:17Z) - A Transformer-Based Adaptive Semantic Aggregation Method for UAV Visual
Geo-Localization [2.1462492411694756]
This paper addresses the task of Unmanned Aerial Vehicles (UAV) visual geo-localization.
Part matching is crucial for UAV visual geo-localization since part-level representations can capture image details and help to understand the semantic information of scenes.
We introduce a transformer-based adaptive semantic aggregation method that regards parts as the most representative semantics in an image.
arXiv Detail & Related papers (2024-01-03T06:58:52Z) - GeoCLIP: Clip-Inspired Alignment between Locations and Images for
Effective Worldwide Geo-localization [61.10806364001535]
Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth.
Existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task.
We propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations.
arXiv Detail & Related papers (2023-09-27T20:54:56Z) - SLAN: Self-Locator Aided Network for Cross-Modal Understanding [89.20623874655352]
We propose Self-Locator Aided Network (SLAN) for cross-modal understanding tasks.
SLAN consists of a region filter and a region adaptor to localize regions of interest conditioned on different texts.
It achieves fairly competitive results on five cross-modal understanding tasks.
arXiv Detail & Related papers (2022-11-28T11:42:23Z) - Where in the World is this Image? Transformer-based Geo-localization in
the Wild [48.69031054573838]
Predicting the geographic location (geo-localization) from a single ground-level RGB image taken anywhere in the world is a very challenging problem.
We propose TransLocator, a unified dual-branch transformer network that attends to tiny details over the entire image.
We evaluate TransLocator on four benchmark datasets - Im2GPS, Im2GPS3k, YFCC4k, YFCC26k and obtain 5.5%, 14.1%, 4.9%, 9.9% continent-level accuracy improvement.
arXiv Detail & Related papers (2022-04-29T03:27:23Z) - Region Similarity Representation Learning [94.88055458257081]
Region Similarity Representation Learning (ReSim) is a new approach to self-supervised representation learning for localization-based tasks.
ReSim learns both regional representations for localization as well as semantic image-level representations.
We show how ReSim learns representations which significantly improve the localization and classification performance compared to a competitive MoCo-v2 baseline.
arXiv Detail & Related papers (2021-03-24T00:42:37Z) - Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization [54.00111565818903]
Cross-view geo-localization is to spot images of the same geographic target from different platforms.
Existing methods usually concentrate on mining the fine-grained feature of the geographic target in the image center.
We introduce a simple and effective deep neural network, called Local Pattern Network (LPN), to take advantage of contextual information.
arXiv Detail & Related papers (2020-08-26T16:06:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.