Simple, Effective and General: A New Backbone for Cross-view Image
Geo-localization
- URL: http://arxiv.org/abs/2302.01572v1
- Date: Fri, 3 Feb 2023 06:50:51 GMT
- Title: Simple, Effective and General: A New Backbone for Cross-view Image
Geo-localization
- Authors: Yingying Zhu, Hongji Yang, Yuxin Lu and Qiang Huang
- Abstract summary: We propose a new backbone network, named Simple Attention-based Image Geo-localization network (SAIG)
The proposed SAIG effectively represents long-range interactions among patches as well as cross-view correspondence with multi-head self-attention layers.
Our SAIG achieves state-of-the-art results on cross-view geo-localization, while being far simpler than previous works.
- Score: 9.687328460113832
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we aim at an important but less explored problem of a simple
yet effective backbone specific for cross-view geo-localization task. Existing
methods for cross-view geo-localization tasks are frequently characterized by
1) complicated methodologies, 2) GPU-consuming computations, and 3) a stringent
assumption that aerial and ground images are centrally or orientation aligned.
To address the above three challenges for cross-view image matching, we propose
a new backbone network, named Simple Attention-based Image Geo-localization
network (SAIG). The proposed SAIG effectively represents long-range
interactions among patches as well as cross-view correspondence with multi-head
self-attention layers. The "narrow-deep" architecture of our SAIG improves the
feature richness without degradation in performance, while its shallow and
effective convolutional stem preserves the locality, eliminating the loss of
patchify boundary information. Our SAIG achieves state-of-the-art results on
cross-view geo-localization, while being far simpler than previous works.
Furthermore, with only 15.9% of the model parameters and half of the output
dimension compared to the state-of-the-art, the SAIG adapts well across
multiple cross-view datasets without employing any well-designed feature
aggregation modules or feature alignment algorithms. In addition, our SAIG
attains competitive scores on image retrieval benchmarks, further demonstrating
its generalizability. As a backbone network, our SAIG is both easy to follow
and computationally lightweight, which is meaningful in practical scenario.
Moreover, we propose a simple Spatial-Mixed feature aggregation moDule (SMD)
that can mix and project spatial information into a low-dimensional space to
generate feature descriptors... (The code is available at
https://github.com/yanghongji2007/SAIG)
Related papers
- SpaGBOL: Spatial-Graph-Based Orientated Localisation [15.324623975476348]
Cross-View Geo-Localisation within urban regions is challenging in part due to the lack of geo-spatial structuring within current datasets and techniques.
We propose utilising graph representations to model sequences of local observations and the connectivity of the target location.
arXiv Detail & Related papers (2024-09-23T20:04:29Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - Sample4Geo: Hard Negative Sampling For Cross-View Geo-Localisation [2.3020018305241337]
We present a simplified but effective architecture based on contrastive learning with symmetric InfoNCE loss.
Our framework consists of a narrow training pipeline that eliminates the need of using aggregation modules.
Our work shows excellent performance on common cross-view datasets like CVUSA, CVACT, University-1652 and VIGOR.
arXiv Detail & Related papers (2023-03-21T13:49:49Z) - Cross-view Geo-localization via Learning Disentangled Geometric Layout
Correspondence [11.823147814005411]
Cross-view geo-localization aims to estimate the location of a query ground image by matching it to a reference geo-tagged aerial images database.
Recent works achieve outstanding progress on cross-view geo-localization benchmarks.
However, existing methods still suffer from poor performance on the cross-area benchmarks.
arXiv Detail & Related papers (2022-12-08T04:54:01Z) - HSGNet: Object Re-identification with Hierarchical Similarity Graph
Network [0.7406388656098399]
Object re-identification method is made up of backbone network, feature aggregation, and loss function.
We design a hierarchical similarity graph module (HSGM) to reduce the conflict of backbone and re-identification networks.
We develop a novel hierarchical similarity graph network (HSGNet) by embedding the HSGM in the backbone network.
arXiv Detail & Related papers (2022-11-10T11:02:40Z) - Learning to Aggregate Multi-Scale Context for Instance Segmentation in
Remote Sensing Images [28.560068780733342]
A novel context aggregation network (CATNet) is proposed to improve the feature extraction process.
The proposed model exploits three lightweight plug-and-play modules, namely dense feature pyramid network (DenseFPN), spatial context pyramid ( SCP), and hierarchical region of interest extractor (HRoIE)
arXiv Detail & Related papers (2021-11-22T08:55:25Z) - Low-Rank Subspaces in GANs [101.48350547067628]
This work introduces low-rank subspaces that enable more precise control of GAN generation.
LowRankGAN is able to find the low-dimensional representation of attribute manifold.
Experiments on state-of-the-art GAN models (including StyleGAN2 and BigGAN) trained on various datasets demonstrate the effectiveness of our LowRankGAN.
arXiv Detail & Related papers (2021-06-08T16:16:32Z) - Multi-Level Graph Convolutional Network with Automatic Graph Learning
for Hyperspectral Image Classification [63.56018768401328]
We propose a Multi-level Graph Convolutional Network (GCN) with Automatic Graph Learning method (MGCN-AGL) for HSI classification.
By employing attention mechanism to characterize the importance among spatially neighboring regions, the most relevant information can be adaptively incorporated to make decisions.
Our MGCN-AGL encodes the long range dependencies among image regions based on the expressive representations that have been produced at local level.
arXiv Detail & Related papers (2020-09-19T09:26:20Z) - Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization [54.00111565818903]
Cross-view geo-localization is to spot images of the same geographic target from different platforms.
Existing methods usually concentrate on mining the fine-grained feature of the geographic target in the image center.
We introduce a simple and effective deep neural network, called Local Pattern Network (LPN), to take advantage of contextual information.
arXiv Detail & Related papers (2020-08-26T16:06:11Z) - High-Order Information Matters: Learning Relation and Topology for
Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment.
Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z) - Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields.
To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss.
We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.