UAV-MM3D: A Large-Scale Synthetic Benchmark for 3D Perception of Unmanned Aerial Vehicles with Multi-Modal Data
- URL: http://arxiv.org/abs/2511.22404v1
- Date: Thu, 27 Nov 2025 12:30:28 GMT
- Title: UAV-MM3D: A Large-Scale Synthetic Benchmark for 3D Perception of Unmanned Aerial Vehicles with Multi-Modal Data
- Authors: Longkun Zou, Jiale Wang, Rongqin Liang, Hai Wu, Ke Chen, Yaowei Wang,
- Abstract summary: We introduce UAV-MM3D, a high-fidelity multimodal synthetic dataset for low-altitude UAV perception and motion understanding.<n>It comprises 400K synchronized frames across diverse scenes (urban areas, suburbs, forests, coastal regions) and weather conditions.<n>Each frame provides 2D/3D bounding boxes, 6-DoF poses, and instance-level annotations, enabling core tasks related to UAVs such as 3D detection, pose estimation, target tracking, and short-term trajectory forecasting.
- Score: 47.317955428393134
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate perception of UAVs in complex low-altitude environments is critical for airspace security and related intelligent systems. Developing reliable solutions requires large-scale, accurately annotated, and multimodal data. However, real-world UAV data collection faces inherent constraints due to airspace regulations, privacy concerns, and environmental variability, while manual annotation of 3D poses and cross-modal correspondences is time-consuming and costly. To overcome these challenges, we introduce UAV-MM3D, a high-fidelity multimodal synthetic dataset for low-altitude UAV perception and motion understanding. It comprises 400K synchronized frames across diverse scenes (urban areas, suburbs, forests, coastal regions) and weather conditions (clear, cloudy, rainy, foggy), featuring multiple UAV models (micro, small, medium-sized) and five modalities - RGB, IR, LiDAR, Radar, and DVS (Dynamic Vision Sensor). Each frame provides 2D/3D bounding boxes, 6-DoF poses, and instance-level annotations, enabling core tasks related to UAVs such as 3D detection, pose estimation, target tracking, and short-term trajectory forecasting. We further propose LGFusionNet, a LiDAR-guided multimodal fusion baseline, and a dedicated UAV trajectory prediction baseline to facilitate benchmarking. With its controllable simulation environment, comprehensive scenario coverage, and rich annotations, UAV3D offers a public benchmark for advancing 3D perception of UAVs.
Related papers
- A Tri-Modal Dataset and a Baseline System for Tracking Unmanned Aerial Vehicles [74.8162337823142]
MM-UAV is the first large-scale benchmark for Multi-Modal UAV Tracking.<n>The dataset spans over 30 challenging scenarios, with 1,321 synchronised multi-modal sequences, and more than 2.8 million annotated frames.<n>Accompanying the dataset, we provide a novel multi-modal multi-UAV tracking framework.
arXiv Detail & Related papers (2025-11-23T08:42:17Z) - FlyAwareV2: A Multimodal Cross-Domain UAV Dataset for Urban Scene Understanding [14.353064152003867]
FlyAwareV2 is a novel dataset encompassing both real and synthetic UAV imagery tailored for urban scene understanding tasks.<n>Building upon the recently introduced SynDrone and FlyAware datasets, FlyAwareV2 introduces several new key contributions.<n>With its rich set of annotations and environmental diversity, FlyAwareV2 provides a valuable resource for research on UAV-based 3D urban scene understanding.
arXiv Detail & Related papers (2025-10-15T07:44:31Z) - Neural 3D Object Reconstruction with Small-Scale Unmanned Aerial Vehicles [16.745245388756533]
Small Unmanned Aerial Vehicles (UAVs) exhibit immense potential for navigating indoor and hard-to-reach areas.<n>We introduce a novel system architecture that enables fully autonomous, high-fidelity 3D scanning of static objects using UAVs weighing under 100 grams.
arXiv Detail & Related papers (2025-09-15T21:08:32Z) - NOVA: Navigation via Object-Centric Visual Autonomy for High-Speed Target Tracking in Unstructured GPS-Denied Environments [56.35569661650558]
We introduce NOVA, a fully onboard, object-centric framework that enables robust target tracking and collision-aware navigation.<n>Rather than constructing a global map, NOVA formulates perception, estimation, and control entirely in the target's reference frame.<n>We validate NOVA across challenging real-world scenarios, including urban mazes, forest trails, and repeated transitions through buildings with intermittent GPS loss.
arXiv Detail & Related papers (2025-06-23T14:28:30Z) - UAV3D: A Large-scale 3D Perception Benchmark for Unmanned Aerial Vehicles [12.278437831053985]
Unmanned Aerial Vehicles (UAVs) are employed in numerous applications, including aerial photography, surveillance, and agriculture.
Existing benchmarks for UAV applications are mainly designed for traditional 2D perception tasks.
UAV3D comprises 1,000 scenes, each of which has 20 frames with fully annotated 3D bounding boxes on vehicles.
arXiv Detail & Related papers (2024-10-14T22:24:11Z) - Clustering-based Learning for UAV Tracking and Pose Estimation [0.0]
This work develops a clustering-based learning detection approach, CL-Det, for UAV tracking and pose estimation using two types of LiDARs.
We first align the timestamps of Livox Avia data and LiDAR 360 data and then separate the point cloud of objects of interest (OOIs) from the environment.
The proposed method shows competitive pose estimation performance and ranks 5th on the final leaderboard of the CVPR 2024 UG2+ Challenge.
arXiv Detail & Related papers (2024-05-27T06:33:25Z) - Fully Convolutional One-Stage 3D Object Detection on LiDAR Range Images [96.66271207089096]
FCOS-LiDAR is a fully convolutional one-stage 3D object detector for LiDAR point clouds of autonomous driving scenes.
We show that an RV-based 3D detector with standard 2D convolutions alone can achieve comparable performance to state-of-the-art BEV-based detectors.
arXiv Detail & Related papers (2022-05-27T05:42:16Z) - A Multi-UAV System for Exploration and Target Finding in Cluttered and
GPS-Denied Environments [68.31522961125589]
We propose a framework for a team of UAVs to cooperatively explore and find a target in complex GPS-denied environments with obstacles.
The team of UAVs autonomously navigates, explores, detects, and finds the target in a cluttered environment with a known map.
Results indicate that the proposed multi-UAV system has improvements in terms of time-cost, the proportion of search area surveyed, as well as successful rates for search and rescue missions.
arXiv Detail & Related papers (2021-07-19T12:54:04Z) - PC-DAN: Point Cloud based Deep Affinity Network for 3D Multi-Object
Tracking (Accepted as an extended abstract in JRDB-ACT Workshop at CVPR21) [68.12101204123422]
A point cloud is a dense compilation of spatial data in 3D coordinates.
We propose a PointNet-based approach for 3D Multi-Object Tracking (MOT)
arXiv Detail & Related papers (2021-06-03T05:36:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.