FlyAwareV2: A Multimodal Cross-Domain UAV Dataset for Urban Scene Understanding
- URL: http://arxiv.org/abs/2510.13243v1
- Date: Wed, 15 Oct 2025 07:44:31 GMT
- Title: FlyAwareV2: A Multimodal Cross-Domain UAV Dataset for Urban Scene Understanding
- Authors: Francesco Barbato, Matteo Caligiuri, Pietro Zanuttigh,
- Abstract summary: FlyAwareV2 is a novel dataset encompassing both real and synthetic UAV imagery tailored for urban scene understanding tasks.<n>Building upon the recently introduced SynDrone and FlyAware datasets, FlyAwareV2 introduces several new key contributions.<n>With its rich set of annotations and environmental diversity, FlyAwareV2 provides a valuable resource for research on UAV-based 3D urban scene understanding.
- Score: 14.353064152003867
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The development of computer vision algorithms for Unmanned Aerial Vehicle (UAV) applications in urban environments heavily relies on the availability of large-scale datasets with accurate annotations. However, collecting and annotating real-world UAV data is extremely challenging and costly. To address this limitation, we present FlyAwareV2, a novel multimodal dataset encompassing both real and synthetic UAV imagery tailored for urban scene understanding tasks. Building upon the recently introduced SynDrone and FlyAware datasets, FlyAwareV2 introduces several new key contributions: 1) Multimodal data (RGB, depth, semantic labels) across diverse environmental conditions including varying weather and daytime; 2) Depth maps for real samples computed via state-of-the-art monocular depth estimation; 3) Benchmarks for RGB and multimodal semantic segmentation on standard architectures; 4) Studies on synthetic-to-real domain adaptation to assess the generalization capabilities of models trained on the synthetic data. With its rich set of annotations and environmental diversity, FlyAwareV2 provides a valuable resource for research on UAV-based 3D urban scene understanding.
Related papers
- History-Enhanced Two-Stage Transformer for Aerial Vision-and-Language Navigation [64.51891404034164]
Aerial Vision-and-Language Navigation (AVLN) requires Unmanned Aerial Vehicle (UAV) agents to localize targets in large-scale urban environments.<n>Existing UAV agents typically adopt mono-granularity frameworks that struggle to balance these two aspects.<n>This work proposes a History-Enhanced Two-Stage Transformer (HETT) framework, which integrates the two aspects through a coarse-to-fine navigation pipeline.
arXiv Detail & Related papers (2025-12-16T09:16:07Z) - UAV-MM3D: A Large-Scale Synthetic Benchmark for 3D Perception of Unmanned Aerial Vehicles with Multi-Modal Data [47.317955428393134]
We introduce UAV-MM3D, a high-fidelity multimodal synthetic dataset for low-altitude UAV perception and motion understanding.<n>It comprises 400K synchronized frames across diverse scenes (urban areas, suburbs, forests, coastal regions) and weather conditions.<n>Each frame provides 2D/3D bounding boxes, 6-DoF poses, and instance-level annotations, enabling core tasks related to UAVs such as 3D detection, pose estimation, target tracking, and short-term trajectory forecasting.
arXiv Detail & Related papers (2025-11-27T12:30:28Z) - Is your VLM Sky-Ready? A Comprehensive Spatial Intelligence Benchmark for UAV Navigation [38.19842131198389]
Vision-Language Models (VLMs) leveraging their powerful visual perception and reasoning capabilities have been widely applied in Unmanned Aerial Vehicle (UAV) tasks.<n>However, the spatial intelligence capabilities of existing VLMs in UAV scenarios remain largely unexplored.<n>We introduce SpatialSky-Bench, a benchmark designed to evaluate the spatial intelligence capabilities of VLMs in UAV navigation.
arXiv Detail & Related papers (2025-11-17T11:39:20Z) - UAVScenes: A Multi-Modal Dataset for UAVs [45.752766099526525]
UAVScenes is a large-scale dataset designed to benchmark various tasks across both 2D and 3D modalities.<n>We enhance this dataset by providing manually labeled semantic annotations for both frame-wise images and LiDAR point clouds.<n>These additions enable a wide range of UAV perception tasks, including segmentation, depth estimation, 6-DoF localization, place recognition, and novel view synthesis.
arXiv Detail & Related papers (2025-07-30T06:29:52Z) - TUM2TWIN: Introducing the Large-Scale Multimodal Urban Digital Twin Benchmark Dataset [90.97440987655084]
Urban Digital Twins (UDTs) have become essential for managing cities and integrating complex, heterogeneous data from diverse sources.<n>To address these challenges, we introduce the first comprehensive multimodal Urban Digital Twin benchmark dataset: TUM2TWIN.<n>This dataset includes georeferenced, semantically aligned 3D models and networks along with various terrestrial, mobile, aerial, and satellite observations boasting 32 data subsets over roughly 100,000 $m2$ and currently 767 GB of data.
arXiv Detail & Related papers (2025-05-12T09:48:32Z) - More Clear, More Flexible, More Precise: A Comprehensive Oriented Object Detection benchmark for UAV [58.89234732689013]
CODrone is a comprehensive oriented object detection dataset for UAVs that accurately reflects real-world conditions.<n>It also serves as a new benchmark designed to align with downstream task requirements.<n>We conduct a series of experiments based on 22 classical or SOTA methods to rigorously evaluate CODrone.
arXiv Detail & Related papers (2025-04-28T17:56:02Z) - InCrowd-VI: A Realistic Visual-Inertial Dataset for Evaluating SLAM in Indoor Pedestrian-Rich Spaces for Human Navigation [2.184775414778289]
InCrowd-VI is a visual-inertial dataset specifically designed for human navigation in indoor pedestrian-rich environments.<n>It features 58 sequences totaling a 5 km trajectory length and 1.5 hours of recording time, including RGB, stereo images, and IMU measurements.<n>Ground-truth trajectories, accurate to approximately 2 cm, are provided in the dataset, originating from the Meta Aria project machine perception SLAM service.
arXiv Detail & Related papers (2024-11-21T17:58:07Z) - Virtually Enriched NYU Depth V2 Dataset for Monocular Depth Estimation: Do We Need Artificial Augmentation? [61.234412062595155]
We present ANYU, a new virtually augmented version of the NYU depth v2 dataset, designed for monocular depth estimation.
In contrast to the well-known approach where full 3D scenes of a virtual world are utilized to generate artificial datasets, ANYU was created by incorporating RGB-D representations of virtual reality objects.
We show that ANYU improves the monocular depth estimation performance and generalization of deep neural networks with considerably different architectures.
arXiv Detail & Related papers (2024-04-15T05:44:03Z) - UAVStereo: A Multiple Resolution Dataset for Stereo Matching in UAV
Scenarios [0.6524460254566905]
This paper constructs a multi-resolution UAV scenario dataset, called UAVStereo, with over 34k stereo image pairs covering 3 typical scenes.
In this paper, we evaluate traditional and state-of-the-art deep learning methods, highlighting their limitations in addressing challenges in UAV scenarios.
arXiv Detail & Related papers (2023-02-20T16:45:27Z) - SensatUrban: Learning Semantics from Urban-Scale Photogrammetric Point
Clouds [52.624157840253204]
We introduce SensatUrban, an urban-scale UAV photogrammetry point cloud dataset consisting of nearly three billion points collected from three UK cities, covering 7.6 km2.
Each point in the dataset has been labelled with fine-grained semantic annotations, resulting in a dataset that is three times the size of the previous existing largest photogrammetric point cloud dataset.
arXiv Detail & Related papers (2022-01-12T14:48:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.