Relative Distance Guided Dynamic Partition Learning for Scale-Invariant UAV-View Geo-Localization
- URL: http://arxiv.org/abs/2412.11535v2
- Date: Mon, 23 Dec 2024 14:17:30 GMT
- Title: Relative Distance Guided Dynamic Partition Learning for Scale-Invariant UAV-View Geo-Localization
- Authors: Quan Chen, Tingyu Wang, Rongfeng Lu, Bolun Zheng, Zhedong Zheng, Chenggang Yan,
- Abstract summary: UAV-view Geo-Localization(UVGL) presents substantial challenges, particularly due to the disparity in visual appearance between drone-captured imagery and satellite perspectives.<n>We propose a partition learning framework based on relative distance, which alleviates the dependence on scale consistency while mining fine-grained features.<n>Our approach achieves superior geo-localization accuracy across various scale-inconsistent scenarios, and exhibits remarkable robustness against scale variations.
- Score: 37.30243235827088
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: UAV-view Geo-Localization~(UVGL) presents substantial challenges, particularly due to the disparity in visual appearance between drone-captured imagery and satellite perspectives. Existing methods usually assume consistent scaling factor across different views. Therefore, they adopt predefined partition alignment and extract viewpoint-invariant representation by constructing a variety of part-level features. However, the scaling assumption is not always hold in the real-world scenarios that variations of UAV flight state leads to the scale mismatch of cross-views, resulting in serious performance degradation. To overcome this issue, we propose a partition learning framework based on relative distance, which alleviates the dependence on scale consistency while mining fine-grained features. Specifically, we propose a distance guided dynamic partition learning strategy~(DGDPL), consisting of a square partition strategy and a distance-guided adjustment strategy. The former is utilized to extract fine-grained features and global features in a simple manner. The latter calculates the relative distance ratio between drone- and satellite-view to adjust the partition size, thereby explicitly aligning the semantic information between partition pairs. Furthermore, we propose a saliency-guided refinement strategy to refine part-level features, so as to further improve the retrieval accuracy. Extensive experiments show that our approach achieves superior geo-localization accuracy across various scale-inconsistent scenarios, and exhibits remarkable robustness against scale variations. The code will be released.
Related papers
- RDD: Robust Feature Detector and Descriptor using Deformable Transformer [8.01082121187363]
We present Robust Deformable Detector (RDD), a novel and robust keypoint detector/descriptor.<n>We observed that deformable attention focuses on key locations, effectively reducing the search space complexity.<n>Our proposed method outperforms all state-of-the-art keypoint detection/description methods in sparse matching tasks.
arXiv Detail & Related papers (2025-05-12T19:24:45Z) - Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy [56.424032454461695]
We present Dita, a scalable framework that leverages Transformer architectures to directly denoise continuous action sequences.
Dita employs in-context conditioning -- enabling fine-grained alignment between denoised actions and raw visual tokens from historical observations.
Dita effectively integrates cross-embodiment datasets across diverse camera perspectives, observation scenes, tasks, and action spaces.
arXiv Detail & Related papers (2025-03-25T15:19:56Z) - Multi-Level Embedding and Alignment Network with Consistency and Invariance Learning for Cross-View Geo-Localization [2.733505168507872]
Cross-View Geo-Localization (CVGL) involves determining the localization of drone images by retrieving the most similar GPS-tagged satellite images.
Existing methods often overlook the problem of increased computational and storage requirements when improving model performance.
We propose a lightweight enhanced alignment network, called the Multi-Level Embedding and Alignment Network (MEAN)
arXiv Detail & Related papers (2024-12-19T13:10:38Z) - AGL-NET: Aerial-Ground Cross-Modal Global Localization with Varying Scales [45.315661330785275]
We present AGL-NET, a novel learning-based method for global localization using LiDAR point clouds and satellite maps.
We tackle two critical challenges: bridging the representation gap between image and points modalities for robust feature matching, and handling inherent scale discrepancies between global view and local view.
arXiv Detail & Related papers (2024-04-04T04:12:30Z) - SDPL: Shifting-Dense Partition Learning for UAV-View Geo-Localization [27.131867916908156]
Cross-view geo-localization aims to match images of the same target from different platforms.
We introduce part-based representation learning, shifting-dense partition learning.
We show that SDPL is robust to position shifting, and performs com-petitively on two prevailing benchmarks.
arXiv Detail & Related papers (2024-03-07T03:07:54Z) - Scale Optimization Using Evolutionary Reinforcement Learning for Object
Detection on Drone Imagery [17.26524675722299]
This paper proposes an evolutionary reinforcement learning agent, integrated within a coarse-to-fine object detection framework, to optimize the scale for more effective detection of objects in such images.
A set of rewards measuring the localization accuracy, the accuracy of predicted labels, and the scale consistency among nearby patches are designed in the agent to guide the scale optimization.
arXiv Detail & Related papers (2023-12-23T10:49:55Z) - Adaptive Spot-Guided Transformer for Consistent Local Feature Matching [64.30749838423922]
We propose Adaptive Spot-Guided Transformer (ASTR) for local feature matching.
ASTR models the local consistency and scale variations in a unified coarse-to-fine architecture.
arXiv Detail & Related papers (2023-03-29T12:28:01Z) - Adaptive Local-Component-aware Graph Convolutional Network for One-shot
Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition.
Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z) - Homography Decomposition Networks for Planar Object Tracking [11.558401177707312]
Planar object tracking plays an important role in AI applications, such as robotics, visual servoing, and visual SLAM.
We propose a novel Homography Decomposition Networks(HDN) approach that drastically reduces and stabilizes the condition number by decomposing the homography transformation into two groups.
arXiv Detail & Related papers (2021-12-15T06:13:32Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - Learning to Aggregate Multi-Scale Context for Instance Segmentation in
Remote Sensing Images [28.560068780733342]
A novel context aggregation network (CATNet) is proposed to improve the feature extraction process.
The proposed model exploits three lightweight plug-and-play modules, namely dense feature pyramid network (DenseFPN), spatial context pyramid ( SCP), and hierarchical region of interest extractor (HRoIE)
arXiv Detail & Related papers (2021-11-22T08:55:25Z) - Bi-level Feature Alignment for Versatile Image Translation and
Manipulation [88.5915443957795]
Generative adversarial networks (GANs) have achieved great success in image translation and manipulation.
High-fidelity image generation with faithful style control remains a grand challenge in computer vision.
This paper presents a versatile image translation and manipulation framework that achieves accurate semantic and style guidance.
arXiv Detail & Related papers (2021-07-07T05:26:29Z) - G$^2$DA: Geometry-Guided Dual-Alignment Learning for RGB-Infrared Person
Re-Identification [3.909938091041451]
RGB-IR person re-identification aims to retrieve person-of-interest between heterogeneous modalities.
This paper presents a Geometry-Guided Dual-Alignment learning framework (G$2$DA) to tackle sample-level modality difference.
arXiv Detail & Related papers (2021-06-15T03:14:31Z) - Spatial-spectral Hyperspectral Image Classification via Multiple Random
Anchor Graphs Ensemble Learning [88.60285937702304]
This paper proposes a novel spatial-spectral HSI classification method via multiple random anchor graphs ensemble learning (RAGE)
Firstly, the local binary pattern is adopted to extract the more descriptive features on each selected band, which preserves local structures and subtle changes of a region.
Secondly, the adaptive neighbors assignment is introduced in the construction of anchor graph, to reduce the computational complexity.
arXiv Detail & Related papers (2021-03-25T09:31:41Z) - Domain-invariant Similarity Activation Map Contrastive Learning for
Retrieval-based Long-term Visual Localization [30.203072945001136]
In this work, a general architecture is first formulated probabilistically to extract domain invariant feature through multi-domain image translation.
And then a novel gradient-weighted similarity activation mapping loss (Grad-SAM) is incorporated for finer localization with high accuracy.
Extensive experiments have been conducted to validate the effectiveness of the proposed approach on the CMUSeasons dataset.
Our performance is on par with or even outperforms the state-of-the-art image-based localization baselines in medium or high precision.
arXiv Detail & Related papers (2020-09-16T14:43:22Z) - Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization [54.00111565818903]
Cross-view geo-localization is to spot images of the same geographic target from different platforms.
Existing methods usually concentrate on mining the fine-grained feature of the geographic target in the image center.
We introduce a simple and effective deep neural network, called Local Pattern Network (LPN), to take advantage of contextual information.
arXiv Detail & Related papers (2020-08-26T16:06:11Z) - Multi-view Drone-based Geo-localization via Style and Spatial Alignment [47.95626612936813]
Multi-view multi-source geo-localization serves as an important auxiliary method of GPS positioning by matching drone-view image and satellite-view image with pre-annotated GPS tag.
We propose an elegant orientation-based method to align the patterns and introduce a new branch to extract aligned partial feature.
arXiv Detail & Related papers (2020-06-23T15:44:02Z) - Scope Head for Accurate Localization in Object Detection [135.9979405835606]
We propose a novel detector coined as ScopeNet, which models anchors of each location as a mutually dependent relationship.
With our concise and effective design, the proposed ScopeNet achieves state-of-the-art results on COCO.
arXiv Detail & Related papers (2020-05-11T04:00:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.