PAUL: Uncertainty-Guided Partition and Augmentation for Robust Cross-View Geo-Localization under Noisy Correspondence
- URL: http://arxiv.org/abs/2508.20066v1
- Date: Wed, 27 Aug 2025 17:21:22 GMT
- Title: PAUL: Uncertainty-Guided Partition and Augmentation for Robust Cross-View Geo-Localization under Noisy Correspondence
- Authors: Zheng Li, Yanming Guo, WenZhe Liu, Xueyi Zhang, Zhaoyun Ding, Long Xu, Mingrui Lao,
- Abstract summary: Cross-view geo-localization is a critical task for UAV navigation, event detection, and aerial surveying.<n>Most existing approaches embed multi-modal data into a joint feature space to maximize the similarity of paired images.<n>These methods typically assume perfect alignment of image pairs during training, which rarely holds true in real-world scenarios.
- Score: 16.322806051944227
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cross-view geo-localization is a critical task for UAV navigation, event detection, and aerial surveying, as it enables matching between drone-captured and satellite imagery. Most existing approaches embed multi-modal data into a joint feature space to maximize the similarity of paired images. However, these methods typically assume perfect alignment of image pairs during training, which rarely holds true in real-world scenarios. In practice, factors such as urban canyon effects, electromagnetic interference, and adverse weather frequently induce GPS drift, resulting in systematic alignment shifts where only partial correspondences exist between pairs. Despite its prevalence, this source of noisy correspondence has received limited attention in current research. In this paper, we formally introduce and address the Noisy Correspondence on Cross-View Geo-Localization (NC-CVGL) problem, aiming to bridge the gap between idealized benchmarks and practical applications. To this end, we propose PAUL (Partition and Augmentation by Uncertainty Learning), a novel framework that partitions and augments training data based on estimated data uncertainty through uncertainty-aware co-augmentation and evidential co-training. Specifically, PAUL selectively augments regions with high correspondence confidence and utilizes uncertainty estimation to refine feature learning, effectively suppressing noise from misaligned pairs. Distinct from traditional filtering or label correction, PAUL leverages both data uncertainty and loss discrepancy for targeted partitioning and augmentation, thus providing robust supervision for noisy samples. Comprehensive experiments validate the effectiveness of individual components in PAUL,which consistently achieves superior performance over other competitive noisy-correspondence-driven methods in various noise ratios.
Related papers
- GroupEnsemble: Efficient Uncertainty Estimation for DETR-based Object Detection [10.241307426902178]
GroupEnsemble is an efficient and effective uncertainty estimation method for DETR-like models.<n>We validated our method under autonomous driving scenes and common daily scenes using the Cityscapes and COCO datasets.<n>The results show that a hybrid approach combining MC-Dropout and GroupEnsemble outperforms Deep Ensembles on several metrics at a fraction of the cost.
arXiv Detail & Related papers (2026-03-02T13:26:40Z) - Aerial View River Landform Video segmentation: A Weakly Supervised Context-aware Temporal Consistency Distillation Approach [20.833194584822532]
The study of terrain and landform classification through UAV remote sensing diverges significantly from ground vehicle patrol tasks.<n>This research substantiates that, in aerial positioning tasks, both the mean Intersection over Union (mIoU) and temporal consistency (TC) metrics are of paramount importance.<n>It is demonstrated that fully labeled data is not the optimal choice, as selecting only key data lacks the enhancement in TC, leading to failures.
arXiv Detail & Related papers (2025-11-20T13:26:38Z) - Modest-Align: Data-Efficient Alignment for Vision-Language Models [67.48633659305592]
Cross-modal alignment models often suffer from overconfidence and degraded performance when operating in resource-constrained settings.<n>We propose Modest-Align, a lightweight alignment framework designed for robustness and efficiency.<n>Our method offers a practical and scalable solution for cross-modal alignment in real-world, low-resource scenarios.
arXiv Detail & Related papers (2025-10-24T16:11:10Z) - Disentangled Noisy Correspondence Learning [56.06801962154915]
Cross-modal retrieval is crucial in understanding latent correspondences across modalities.
DisNCL is a novel information-theoretic framework for feature Disentanglement in Noisy Correspondence Learning.
arXiv Detail & Related papers (2024-08-10T09:49:55Z) - Stable Neighbor Denoising for Source-free Domain Adaptive Segmentation [91.83820250747935]
Pseudo-label noise is mainly contained in unstable samples in which predictions of most pixels undergo significant variations during self-training.
We introduce the Stable Neighbor Denoising (SND) approach, which effectively discovers highly correlated stable and unstable samples.
SND consistently outperforms state-of-the-art methods in various SFUDA semantic segmentation settings.
arXiv Detail & Related papers (2024-06-10T21:44:52Z) - Collaborative Edge AI Inference over Cloud-RAN [37.3710464868215]
A cloud radio access network (Cloud-RAN) based collaborative edge AI inference architecture is proposed.
Specifically, geographically distributed devices capture real-time noise-corrupted sensory data samples and extract the noisy local feature vectors.
We allow each RRH receives local feature vectors from all devices over the same resource blocks simultaneously by leveraging an over-the-air computation (AirComp) technique.
These aggregated feature vectors are quantized and transmitted to a central processor for further aggregation and downstream inference tasks.
arXiv Detail & Related papers (2024-04-09T04:26:16Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - SpatialRank: Urban Event Ranking with NDCG Optimization on
Spatiotemporal Data [55.609946936979036]
We propose a novel spatial event ranking approach named SpatialRank.
We show that SpatialRank can effectively identify the top riskiest locations of crimes and traffic accidents.
arXiv Detail & Related papers (2023-09-30T06:20:21Z) - Implicit neural representation for change detection [15.741202788959075]
Most commonly used approaches to detecting changes in point clouds are based on supervised methods.
We propose an unsupervised approach that comprises two components: Implicit Neural Representation (INR) for continuous shape reconstruction and a Gaussian Mixture Model for categorising changes.
We apply our method to a benchmark dataset comprising simulated LiDAR point clouds for urban sprawling.
arXiv Detail & Related papers (2023-07-28T09:26:00Z) - Far Away in the Deep Space: Dense Nearest-Neighbor-Based
Out-of-Distribution Detection [33.78080060234557]
Nearest-Neighbors approaches have been shown to work well in object-centric data domains.
We show that nearest-neighbor approaches also yield state-of-the-art results on dense novelty detection in complex driving scenes.
arXiv Detail & Related papers (2022-11-12T13:32:19Z) - Ordinal UNLOC: Target Localization with Noisy and Incomplete Distance
Measures [1.6836876499886007]
A main challenge in target localization arises from the lack of reliable distance measures.
We develop a new computational framework to estimate the location of a target without the need for reliable distance measures.
arXiv Detail & Related papers (2021-05-06T13:54:31Z) - Bridging the Gap Between Clean Data Training and Real-World Inference
for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference.
We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space.
Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.