Related papers: Beyond Paired Data: Self-Supervised UAV Geo-Localization from Reference Imagery Alone

Beyond Paired Data: Self-Supervised UAV Geo-Localization from Reference Imagery Alone

URL: http://arxiv.org/abs/2512.02737v1
Date: Tue, 02 Dec 2025 13:21:20 GMT
Title: Beyond Paired Data: Self-Supervised UAV Geo-Localization from Reference Imagery Alone
Authors: Tristan Amadei, Enric Meinhardt-Llopis, Benedicte Bascle, Corentin Abgrall, Gabriele Facciolo,
Abstract summary: We present a training paradigm that removes the need for UAV imagery during training by learning directly from satellite-view reference images.<n>This is achieved through a dedicated augmentation strategy that simulates the visual domain shift between satellite and real-world UAV views.<n>We introduce CAEVL, an efficient model designed to exploit this paradigm, and validate it on ViLD, a new and challenging dataset of real-world UAV images.
Score: 11.74837809839014
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Image-based localization in GNSS-denied environments is critical for UAV autonomy. Existing state-of-the-art approaches rely on matching UAV images to geo-referenced satellite images; however, they typically require large-scale, paired UAV-satellite datasets for training. Such data are costly to acquire and often unavailable, limiting their applicability. To address this challenge, we adopt a training paradigm that removes the need for UAV imagery during training by learning directly from satellite-view reference images. This is achieved through a dedicated augmentation strategy that simulates the visual domain shift between satellite and real-world UAV views. We introduce CAEVL, an efficient model designed to exploit this paradigm, and validate it on ViLD, a new and challenging dataset of real-world UAV images that we release to the community. Our method achieves competitive performance compared to approaches trained with paired data, demonstrating its effectiveness and strong generalization capabilities.

Related papers

How Far are Modern Trackers from UAV-Anti-UAV? A Million-Scale Benchmark and New Baseline [74.4054700050366]
Unmanned Aerial Vehicles (UAVs) offer wide-ranging applications but also pose significant safety and privacy violation risks.<n>Current Anti-UAV research primarily focuses on RGB, infrared (IR), or RGB-IR videos captured by fixed ground cameras.<n>We propose a new multi-modal visual tracking task termed UAV-Anti-UAV, which involves a pursuer UAV tracking a target adversarial UAV in the video stream.
arXiv Detail & Related papers (2025-12-08T10:19:54Z)
AerialMind: Towards Referring Multi-Object Tracking in UAV Scenarios [64.51320327698231]
We introduce AerialMind, the first large-scale RMOT benchmark in UAV scenarios.<n>We develop an innovative semi-automated collaborative agent-based labeling assistant framework.<n>We also propose HawkEyeTrack, a novel method that collaboratively enhances vision-language representation learning.
arXiv Detail & Related papers (2025-11-26T04:44:27Z)
More Clear, More Flexible, More Precise: A Comprehensive Oriented Object Detection benchmark for UAV [58.89234732689013]
CODrone is a comprehensive oriented object detection dataset for UAVs that accurately reflects real-world conditions.<n>It also serves as a new benchmark designed to align with downstream task requirements.<n>We conduct a series of experiments based on 22 classical or SOTA methods to rigorously evaluate CODrone.
arXiv Detail & Related papers (2025-04-28T17:56:02Z)
AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations [51.44608822712786]
Visual grounding aims to localize target objects in an image based on natural language descriptions.<n>AerialVG poses new challenges, emphe.g., appearance-based grounding is insufficient to distinguish among multiple visually similar objects.<n>We introduce the first AerialVG dataset, consisting of 5K real-world aerial images, 50K manually annotated descriptions, and 103K objects.
arXiv Detail & Related papers (2025-04-10T15:13:00Z)
A Deep Learning Framework with Geographic Information Adaptive Loss for Remote Sensing Images based UAV Self-Positioning [10.16507150219648]
Self-positioning of UAVs in GPS-denied environments has become a critical objective.<n>We present a deep learning framework with geographic information adaptive loss to achieve precise localization.<n>Results demonstrate the method's efficacy in enabling UAVs to achieve precise self-positioning.
arXiv Detail & Related papers (2025-02-22T09:36:34Z)
Game4Loc: A UAV Geo-Localization Benchmark from Game Data [0.0]
We introduce a more practical UAV geo-localization task including partial matches of cross-view paired data.<n>Experiments demonstrate the effectiveness of our data and training method for UAV geo-localization.
arXiv Detail & Related papers (2024-09-25T13:33:28Z)
UAV-VisLoc: A Large-scale Dataset for UAV Visual Localization [20.37586403749362]
We present a large-scale dataset, UAV-VisLoc, to facilitate the UAV visual localization task. Our dataset includes 6,742 drone images and 11 satellite maps, with metadata such as latitude, longitude, altitude, and capture date.
arXiv Detail & Related papers (2024-05-20T10:24:10Z)
View Distribution Alignment with Progressive Adversarial Learning for UAV Visual Geo-Localization [10.442998017077795]
Unmanned Aerial Vehicle (UAV) visual geo-localization aims to match images of the same geographic target captured from different views, i.e., the UAV view and the satellite view. Previous works map images captured by UAVs and satellites to a shared feature space and employ a classification framework to learn location-dependent features. This paper introduces distribution alignment of the two views to shorten their distance in a common space.
arXiv Detail & Related papers (2024-01-03T06:58:09Z)
Multiview Aerial Visual Recognition (MAVREC): Can Multi-view Improve Aerial Visual Perception? [57.77643186237265]
We present Multiview Aerial Visual RECognition or MAVREC, a video dataset where we record synchronized scenes from different perspectives. MAVREC consists of around 2.5 hours of industry-standard 2.7K resolution video sequences, more than 0.5 million frames, and 1.1 million annotated bounding boxes. This makes MAVREC the largest ground and aerial-view dataset, and the fourth largest among all drone-based datasets.
arXiv Detail & Related papers (2023-12-07T18:59:14Z)
Towards Scale Consistent Monocular Visual Odometry by Learning from the Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data. We first train a scale-aware disparity network using both monocular real images and stereo virtual data. The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z)
Vision-Based UAV Self-Positioning in Low-Altitude Urban Environments [20.69412701553767]
Unmanned Aerial Vehicles (UAVs) rely on satellite systems for stable positioning. In such situations, vision-based techniques can serve as an alternative, ensuring the self-positioning capability of UAVs. This paper presents a new dataset, DenseUAV, which is the first publicly available dataset designed for the UAV self-positioning task.
arXiv Detail & Related papers (2022-01-23T07:18:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.