Related papers: DiffusionUavLoc: Visually Prompted Diffusion for Cross-View UAV Localization

DiffusionUavLoc: Visually Prompted Diffusion for Cross-View UAV Localization

URL: http://arxiv.org/abs/2511.06422v1
Date: Sun, 09 Nov 2025 15:27:17 GMT
Title: DiffusionUavLoc: Visually Prompted Diffusion for Cross-View UAV Localization
Authors: Tao Liu, Kan Ren, Qian Chen,
Abstract summary: DiffusionUavLoc is a cross-view localization framework that is image-prompted, text-free, diffusion-centric, and employs a VAE for unified representation.<n>We first use training-free geometric rendering to synthesize pseudo-satellite images from UAV imagery as structural prompts.<n>At inference, descriptors are computed at a fixed time step t and compared using cosine similarity.
Score: 17.908597896653045
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the rapid growth of the low-altitude economy, unmanned aerial vehicles (UAVs) have become key platforms for measurement and tracking in intelligent patrol systems. However, in GNSS-denied environments, localization schemes that rely solely on satellite signals are prone to failure. Cross-view image retrieval-based localization is a promising alternative, yet substantial geometric and appearance domain gaps exist between oblique UAV views and nadir satellite orthophotos. Moreover, conventional approaches often depend on complex network architectures, text prompts, or large amounts of annotation, which hinders generalization. To address these issues, we propose DiffusionUavLoc, a cross-view localization framework that is image-prompted, text-free, diffusion-centric, and employs a VAE for unified representation. We first use training-free geometric rendering to synthesize pseudo-satellite images from UAV imagery as structural prompts. We then design a text-free conditional diffusion model that fuses multimodal structural cues to learn features robust to viewpoint changes. At inference, descriptors are computed at a fixed time step t and compared using cosine similarity. On University-1652 and SUES-200, the method performs competitively for cross-view localization, especially for satellite-to-drone in University-1652.Our data and code will be published at the following URL: https://github.com/liutao23/DiffusionUavLoc.git.

Related papers

Object Detection as an Optional Basis: A Graph Matching Network for Cross-View UAV Localization [17.908597896653045]
This paper presents a cross-view UAV localization framework that performs map matching via object detection.<n>In typical pipelines, UAV visual localization is formulated as an image-retrieval problem.<n>Our method achieves strong retrieval and localization performance using a fine-grained, graph-based node-similarity metric.
arXiv Detail & Related papers (2025-11-04T11:25:31Z)
Cross-View Open-Vocabulary Object Detection in Aerial Imagery [48.851422992413184]
We propose a novel framework for adapting open-vocabulary representations from ground-view images to solve object detection in aerial imagery.<n>The method introduces contrastive image-to-image alignment to enhance the similarity between aerial and ground-view embeddings.<n>Our open-vocabulary model achieves improvements of +6.32 mAP on DOTAv2, +4.16 mAP on VisDrone (Images), and +3.46 mAP on HRRSD in the zero-shot setting.
arXiv Detail & Related papers (2025-10-04T16:12:03Z)
SkyLink: Unifying Street-Satellite Geo-Localization via UAV-Mediated 3D Scene Alignment [8.886221192801381]
Cross-view geo-localization aims at establishing location correspondences between different viewpoints.<n>Existing approaches typically learn cross-view correlations through direct feature similarity matching.<n>We propose the novel SkyLink method to address this unique problem.
arXiv Detail & Related papers (2025-09-29T13:43:18Z)
Loc$^2$: Interpretable Cross-View Localization via Depth-Lifted Local Feature Matching [80.57282092735991]
We propose an accurate and interpretable fine-grained cross-view localization method.<n>It estimates the 3 Degrees of Freedom (DoF) pose of a ground-level image by matching its local features with a reference aerial image.<n> Experiments show state-of-the-art accuracy in challenging scenarios such as cross-area testing and unknown orientation.
arXiv Detail & Related papers (2025-09-11T18:52:16Z)
Hierarchical Image Matching for UAV Absolute Visual Localization via Semantic and Structural Constraints [10.639191465547517]
Absolute localization is crucial for unmanned aerial vehicles (UAVs) in various applications, but it becomes challenging when global navigation satellite system (GNSS) signals are unavailable.<n> Vision-based absolute localization methods, which locate the current view of the UAV in a reference satellite map to estimate its position, have become popular in-denied scenarios.<n>Existing methods mostly rely on traditional and low-level image matching, suffering from difficulties due to significant differences introduced by cross-source discrepancies and temporal variations.<n>We introduce a hierarchical cross-source image matching method designed for UAV absolute localization, which integrates a semantic-aware and
arXiv Detail & Related papers (2025-06-11T13:53:03Z)
AeroReformer: Aerial Referring Transformer for UAV-based Referring Image Segmentation [9.55871636831991]
We propose a novel framework for UAV referring image segmentation (UAV-RIS)<n>AeroReformer features a Vision-Language Cross-Attention Module (VLCAM) for effective cross-modal understanding and a Rotation-Aware Multi-Scale Fusion decoder.<n>Experiments on two newly developed datasets demonstrate the superiority of AeroReformer over existing methods.
arXiv Detail & Related papers (2025-02-23T18:49:00Z)
View Distribution Alignment with Progressive Adversarial Learning for UAV Visual Geo-Localization [10.442998017077795]
Unmanned Aerial Vehicle (UAV) visual geo-localization aims to match images of the same geographic target captured from different views, i.e., the UAV view and the satellite view. Previous works map images captured by UAVs and satellites to a shared feature space and employ a classification framework to learn location-dependent features. This paper introduces distribution alignment of the two views to shorten their distance in a common space.
arXiv Detail & Related papers (2024-01-03T06:58:09Z)
DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [55.48770333927732]
We propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection. It consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor. Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-12-11T18:38:28Z)
DiffusionSat: A Generative Foundation Model for Satellite Imagery [63.2807119794691]
We present DiffusionSat, to date the largest generative foundation model trained on a collection of publicly available large, high-resolution remote sensing datasets. Our method produces realistic samples and can be used to solve multiple generative tasks including temporal generation, superresolution given multi-spectral inputs and in-painting.
arXiv Detail & Related papers (2023-12-06T16:53:17Z)
Beyond Cross-view Image Retrieval: Highly Accurate Vehicle Localization Using Satellite Image [91.29546868637911]
This paper addresses the problem of vehicle-mounted camera localization by matching a ground-level image with an overhead-view satellite map. The key idea is to formulate the task as pose estimation and solve it by neural-net based optimization. Experiments on standard autonomous vehicle localization datasets have confirmed the superiority of the proposed method.
arXiv Detail & Related papers (2022-04-10T19:16:58Z)
iSDF: Real-Time Neural Signed Distance Fields for Robot Perception [64.80458128766254]
iSDF is a continuous learning system for real-time signed distance field reconstruction. It produces more accurate reconstructions and better approximations of collision costs and gradients.
arXiv Detail & Related papers (2022-04-05T15:48:39Z)
Vision-Based UAV Self-Positioning in Low-Altitude Urban Environments [20.69412701553767]
Unmanned Aerial Vehicles (UAVs) rely on satellite systems for stable positioning. In such situations, vision-based techniques can serve as an alternative, ensuring the self-positioning capability of UAVs. This paper presents a new dataset, DenseUAV, which is the first publicly available dataset designed for the UAV self-positioning task.
arXiv Detail & Related papers (2022-01-23T07:18:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.