Related papers: Exploring the best way for UAV visual localization under Low-altitude Multi-view Observation Condition: a Benchmark

Exploring the best way for UAV visual localization under Low-altitude Multi-view Observation Condition: a Benchmark

URL: http://arxiv.org/abs/2503.10692v1
Date: Wed, 12 Mar 2025 03:29:27 GMT
Title: Exploring the best way for UAV visual localization under Low-altitude Multi-view Observation Condition: a Benchmark
Authors: Yibin Ye, Xichao Teng, Shuo Chen, Zhang Li, Leqi Liu, Qifeng Yu, Tao Tan,
Abstract summary: Low-altitude, Multi-view UAV AVL presents greater challenges due to extreme viewpoint changes.<n>This benchmark revealed the challenges of Low-altitude, Multi-view UAV AVL and provided valuable guidance for future research.
Score: 6.693781685584959
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Absolute Visual Localization (AVL) enables Unmanned Aerial Vehicle (UAV) to determine its position in GNSS-denied environments by establishing geometric relationships between UAV images and geo-tagged reference maps. While many previous works have achieved AVL with image retrieval and matching techniques, research in low-altitude multi-view scenarios still remains limited. Low-altitude Multi-view condition presents greater challenges due to extreme viewpoint changes. To explore the best UAV AVL approach in such condition, we proposed this benchmark. Firstly, a large-scale Low-altitude Multi-view dataset called AnyVisLoc was constructed. This dataset includes 18,000 images captured at multiple scenes and altitudes, along with 2.5D reference maps containing aerial photogrammetry maps and historical satellite maps. Secondly, a unified framework was proposed to integrate the state-of-the-art AVL approaches and comprehensively test their performance. The best combined method was chosen as the baseline and the key factors that influencing localization accuracy are thoroughly analyzed based on it. This baseline achieved a 74.1% localization accuracy within 5m under Low-altitude, Multi-view conditions. In addition, a novel retrieval metric called PDM@K was introduced to better align with the characteristics of the UAV AVL task. Overall, this benchmark revealed the challenges of Low-altitude, Multi-view UAV AVL and provided valuable guidance for future research. The dataset and codes are available at https://github.com/UAV-AVL/Benchmark

Related papers

UAVScenes: A Multi-Modal Dataset for UAVs [45.752766099526525]
UAVScenes is a large-scale dataset designed to benchmark various tasks across both 2D and 3D modalities.<n>We enhance this dataset by providing manually labeled semantic annotations for both frame-wise images and LiDAR point clouds.<n>These additions enable a wide range of UAV perception tasks, including segmentation, depth estimation, 6-DoF localization, place recognition, and novel view synthesis.
arXiv Detail & Related papers (2025-07-30T06:29:52Z)
More Clear, More Flexible, More Precise: A Comprehensive Oriented Object Detection benchmark for UAV [58.89234732689013]
CODrone is a comprehensive oriented object detection dataset for UAVs that accurately reflects real-world conditions. It also serves as a new benchmark designed to align with downstream task requirements. We conduct a series of experiments based on 22 classical or SOTA methods to rigorously evaluate CODrone.
arXiv Detail & Related papers (2025-04-28T17:56:02Z)
COST: Contrastive One-Stage Transformer for Vision-Language Small Object Tracking [52.62149024881728]
We propose a contrastive one-stage transformer fusion framework for vision-language (VL) tracking. We introduce a contrastive alignment strategy that maximizes mutual information between a video and its corresponding language description. By leveraging a visual-linguistic transformer, we establish an efficient multi-modal fusion and reasoning mechanism.
arXiv Detail & Related papers (2025-04-02T03:12:38Z)
GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks [84.86699025256705]
We present GEOBench-VLM, a benchmark specifically designed to evaluate Vision-Language Models (VLMs) on geospatial tasks.<n>Our benchmark features over 10,000 manually verified instructions and spanning diverse visual conditions, object types, and scales.<n>We evaluate several state-of-the-art VLMs to assess performance on geospatial-specific challenges.
arXiv Detail & Related papers (2024-11-28T18:59:56Z)
VQ-Map: Bird's-Eye-View Map Layout Estimation in Tokenized Discrete Space via Vector Quantization [108.68014173017583]
Bird's-eye-view (BEV) map layout estimation requires an accurate and full understanding of the semantics for the environmental elements around the ego car. We propose to utilize a generative model similar to the Vector Quantized-Variational AutoEncoder (VQ-VAE) to acquire prior knowledge for the high-level BEV semantics in the tokenized discrete space. Thanks to the obtained BEV tokens accompanied with a codebook embedding encapsulating the semantics for different BEV elements in the groundtruth maps, we are able to directly align the sparse backbone image features with the obtained BEV tokens
arXiv Detail & Related papers (2024-11-03T16:09:47Z)
UAVDB: Point-Guided Masks for UAV Detection and Segmentation [0.03464344220266879]
We present UAVDB, a new benchmark dataset for UAV detection and segmentation.<n>It is built upon a point-guided weak supervision pipeline.<n>UAVDB captures UAVs at diverse scales, from visible objects to near-single-pixel instances.
arXiv Detail & Related papers (2024-09-09T13:27:53Z)
SOOD++: Leveraging Unlabeled Data to Boost Oriented Object Detection [59.868772767818975]
We propose a simple yet effective Semi-supervised Oriented Object Detection method termed SOOD++. Specifically, we observe that objects from aerial images are usually arbitrary orientations, small scales, and aggregation. Extensive experiments conducted on various multi-oriented object datasets under various labeled settings demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2024-07-01T07:03:51Z)
UAV-VisLoc: A Large-scale Dataset for UAV Visual Localization [20.37586403749362]
We present a large-scale dataset, UAV-VisLoc, to facilitate the UAV visual localization task. Our dataset includes 6,742 drone images and 11 satellite maps, with metadata such as latitude, longitude, altitude, and capture date.
arXiv Detail & Related papers (2024-05-20T10:24:10Z)
UAVD4L: A Large-Scale Dataset for UAV 6-DoF Localization [14.87295056434887]
We introduce a large-scale 6-DoF UAV dataset for localization (UAVD4L) We develop a two-stage 6-DoF localization pipeline (UAVLoc), which consists of offline synthetic data generation and online visual localization. Results on the new dataset demonstrate the effectiveness of the proposed approach.
arXiv Detail & Related papers (2024-01-11T15:19:21Z)
Multiview Aerial Visual Recognition (MAVREC): Can Multi-view Improve Aerial Visual Perception? [57.77643186237265]
We present Multiview Aerial Visual RECognition or MAVREC, a video dataset where we record synchronized scenes from different perspectives. MAVREC consists of around 2.5 hours of industry-standard 2.7K resolution video sequences, more than 0.5 million frames, and 1.1 million annotated bounding boxes. This makes MAVREC the largest ground and aerial-view dataset, and the fourth largest among all drone-based datasets.
arXiv Detail & Related papers (2023-12-07T18:59:14Z)
SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using Vision-Language Models [33.814335088752046]
We introduce SATellite ImageNet (SATIN), a metadataset curated from 27 existing remotely sensed datasets. We comprehensively evaluate the zero-shot transfer classification capabilities of a broad range of vision-language (VL) models on SATIN. We find SATIN to be a challenging benchmark-the strongest method we evaluate achieves a classification accuracy of 52.0%.
arXiv Detail & Related papers (2023-04-23T11:23:05Z)
UAVStereo: A Multiple Resolution Dataset for Stereo Matching in UAV Scenarios [0.6524460254566905]
This paper constructs a multi-resolution UAV scenario dataset, called UAVStereo, with over 34k stereo image pairs covering 3 typical scenes. In this paper, we evaluate traditional and state-of-the-art deep learning methods, highlighting their limitations in addressing challenges in UAV scenarios.
arXiv Detail & Related papers (2023-02-20T16:45:27Z)
Vision-Based UAV Self-Positioning in Low-Altitude Urban Environments [20.69412701553767]
Unmanned Aerial Vehicles (UAVs) rely on satellite systems for stable positioning. In such situations, vision-based techniques can serve as an alternative, ensuring the self-positioning capability of UAVs. This paper presents a new dataset, DenseUAV, which is the first publicly available dataset designed for the UAV self-positioning task.
arXiv Detail & Related papers (2022-01-23T07:18:55Z)
An Empirical Study of Training End-to-End Vision-and-Language Transformers [50.23532518166621]
We present METER(textbfMultimodal textbfEnd-to-end textbfTransformtextbfER), through which we investigate how to design and pre-train a fully transformer-based VL model. Specifically, we dissect the model designs along multiple dimensions: vision encoders (e.g., CLIP-ViT, Swin transformer), text encoders (e.g., RoBERTa, DeBERTa), multimodal fusion (e.g., merged attention vs. co-
arXiv Detail & Related papers (2021-11-03T17:55:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.