SDPL: Shifting-Dense Partition Learning for UAV-View Geo-Localization
- URL: http://arxiv.org/abs/2403.04172v2
- Date: Sat, 6 Jul 2024 11:26:39 GMT
- Title: SDPL: Shifting-Dense Partition Learning for UAV-View Geo-Localization
- Authors: Quan Chen, Tingyu Wang, Zihao Yang, Haoran Li, Rongfeng Lu, Yaoqi Sun, Bolun Zheng, Chenggang Yan,
- Abstract summary: Cross-view geo-localization aims to match images of the same target from different platforms.
We introduce part-based representation learning, shifting-dense partition learning.
We show that SDPL is robust to position shifting, and performs com-petitively on two prevailing benchmarks.
- Score: 27.131867916908156
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-view geo-localization aims to match images of the same target from different platforms, e.g., drone and satellite. It is a challenging task due to the changing appearance of targets and environmental content from different views. Most methods focus on obtaining more comprehensive information through feature map segmentation, while inevitably destroying the image structure, and are sensitive to the shifting and scale of the target in the query. To address the above issues, we introduce simple yet effective part-based representation learning, shifting-dense partition learning (SDPL). We propose a dense partition strategy (DPS), dividing the image into multiple parts to explore contextual information while explicitly maintaining the global structure. To handle scenarios with non-centered targets, we further propose the shifting-fusion strategy, which generates multiple sets of parts in parallel based on various segmentation centers, and then adaptively fuses all features to integrate their anti-offset ability. Extensive experiments show that SDPL is robust to position shifting, and performs com-petitively on two prevailing benchmarks, University-1652 and SUES-200. In addition, SDPL shows satisfactory compatibility with a variety of backbone networks (e.g., ResNet and Swin). https://github.com/C-water/SDPL release.
Related papers
- A Transformer-Based Adaptive Semantic Aggregation Method for UAV Visual
Geo-Localization [2.1462492411694756]
This paper addresses the task of Unmanned Aerial Vehicles (UAV) visual geo-localization.
Part matching is crucial for UAV visual geo-localization since part-level representations can capture image details and help to understand the semantic information of scenes.
We introduce a transformer-based adaptive semantic aggregation method that regards parts as the most representative semantics in an image.
arXiv Detail & Related papers (2024-01-03T06:58:52Z) - De-coupling and De-positioning Dense Self-supervised Learning [65.56679416475943]
Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects.
We show that they suffer from coupling and positional bias, which arise from the receptive field increasing with layer depth and zero-padding.
We demonstrate the benefits of our method on COCO and on a new challenging benchmark, OpenImage-MINI, for object classification, semantic segmentation, and object detection.
arXiv Detail & Related papers (2023-03-29T18:07:25Z) - Image as Set of Points [60.30495338399321]
Context clusters (CoCs) view an image as a set of unorganized points and extract features via simplified clustering algorithm.
Our CoCs are convolution- and attention-free, and only rely on clustering algorithm for spatial interaction.
arXiv Detail & Related papers (2023-03-02T18:56:39Z) - Simple, Effective and General: A New Backbone for Cross-view Image
Geo-localization [9.687328460113832]
We propose a new backbone network, named Simple Attention-based Image Geo-localization network (SAIG)
The proposed SAIG effectively represents long-range interactions among patches as well as cross-view correspondence with multi-head self-attention layers.
Our SAIG achieves state-of-the-art results on cross-view geo-localization, while being far simpler than previous works.
arXiv Detail & Related papers (2023-02-03T06:50:51Z) - Self-Training Guided Disentangled Adaptation for Cross-Domain Remote
Sensing Image Semantic Segmentation [20.07907723950031]
We propose a self-training guided disentangled adaptation network (ST-DASegNet) for cross-domain RS image semantic segmentation task.
We first propose source student backbone and target student backbone to respectively extract the source-style and target-style feature for both source and target images.
We then propose a domain disentangled module to extract the universal feature and purify the distinct feature of source-style and target-style features.
arXiv Detail & Related papers (2023-01-13T13:11:22Z) - Vision Transformers: From Semantic Segmentation to Dense Prediction [139.15562023284187]
We explore the global context learning potentials of vision transformers (ViTs) for dense visual prediction.
Our motivation is that through learning global context at full receptive field layer by layer, ViTs may capture stronger long-range dependency information.
We formulate a family of Hierarchical Local-Global (HLG) Transformers, characterized by local attention within windows and global-attention across windows in a pyramidal architecture.
arXiv Detail & Related papers (2022-07-19T15:49:35Z) - A Unified Transformer Framework for Group-based Segmentation:
Co-Segmentation, Co-Saliency Detection and Video Salient Object Detection [59.21990697929617]
Humans tend to mine objects by learning from a group of images or several frames of video since we live in a dynamic world.
Previous approaches design different networks on similar tasks separately, and they are difficult to apply to each other.
We introduce a unified framework to tackle these issues, term as UFO (UnifiedObject Framework for Co-Object Framework)
arXiv Detail & Related papers (2022-03-09T13:35:19Z) - A Transformer-Based Feature Segmentation and Region Alignment Method For
UAV-View Geo-Localization [0.5257115841810257]
Cross-view geo-localization is a task of matching the same geographic image from different views.
Existing methods are mainly aimed at digging for more comprehensive fine-grained information.
We introduce a simple and efficient transformer-based structure called Feature and Region Alignment (FSRA) to enhance the model's ability to understand contextual information.
arXiv Detail & Related papers (2022-01-23T08:01:42Z) - Remote Sensing Images Semantic Segmentation with General Remote Sensing
Vision Model via a Self-Supervised Contrastive Learning Method [13.479068312825781]
We propose Global style and Local matching Contrastive Learning Network (GLCNet) for remote sensing semantic segmentation.
Specifically, the global style contrastive module is used to learn an image-level representation better.
The local features matching contrastive module is designed to learn representations of local regions which is beneficial for semantic segmentation.
arXiv Detail & Related papers (2021-06-20T03:03:40Z) - Inter-Image Communication for Weakly Supervised Localization [77.2171924626778]
Weakly supervised localization aims at finding target object regions using only image-level supervision.
We propose to leverage pixel-level similarities across different objects for learning more accurate object locations.
Our method achieves the Top-1 localization error rate of 45.17% on the ILSVRC validation set.
arXiv Detail & Related papers (2020-08-12T04:14:11Z) - Improving Few-shot Learning by Spatially-aware Matching and
CrossTransformer [116.46533207849619]
We study the impact of scale and location mismatch in the few-shot learning scenario.
We propose a novel Spatially-aware Matching scheme to effectively perform matching across multiple scales and locations.
arXiv Detail & Related papers (2020-01-06T14:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.