Related papers: LAA3D: A Benchmark of Detecting and Tracking Low-Altitude Aircraft in 3D Space

LAA3D: A Benchmark of Detecting and Tracking Low-Altitude Aircraft in 3D Space

URL: http://arxiv.org/abs/2511.19057v1
Date: Mon, 24 Nov 2025 12:50:34 GMT
Title: LAA3D: A Benchmark of Detecting and Tracking Low-Altitude Aircraft in 3D Space
Authors: Hai Wu, Shuai Tang, Jiale Wang, Longkun Zou, Mingyue Guo, Rongqin Liang, Ke Chen, Yaowei Wang,
Abstract summary: We present LAA3D, a large-scale dataset designed to advance 3D detection and tracking of low-altitude aerial vehicles.<n>LAA3D contains 15,000 real images and 600,000 synthetic frames, captured across diverse scenarios.<n>It covers multiple aerial object categories, including electric Vertical Take-Off and Landing (eVTOL) aircraft, Micro Aerial Vehicles (MAVs) and Helicopters.
Score: 46.00559036244609
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Perception of Low-Altitude Aircraft (LAA) in 3D space enables precise 3D object localization and behavior understanding. However, datasets tailored for 3D LAA perception remain scarce. To address this gap, we present LAA3D, a large-scale dataset designed to advance 3D detection and tracking of low-altitude aerial vehicles. LAA3D contains 15,000 real images and 600,000 synthetic frames, captured across diverse scenarios, including urban and suburban environments. It covers multiple aerial object categories, including electric Vertical Take-Off and Landing (eVTOL) aircraft, Micro Aerial Vehicles (MAVs), and Helicopters. Each instance is annotated with 3D bounding box, class label, and instance identity, supporting tasks such as 3D object detection, 3D multi-object tracking (MOT), and 6-DoF pose estimation. Besides, we establish the LAA3D Benchmark, integrating multiple tasks and methods with unified evaluation protocols for comparison. Furthermore, we propose MonoLAA, a monocular 3D detection baseline, achieving robust 3D localization from zoom cameras with varying focal lengths. Models pretrained on synthetic images transfer effectively to real-world data with fine-tuning, demonstrating strong sim-to-real generalization. Our LAA3D provides a comprehensive foundation for future research in low-altitude 3D object perception.

Related papers

N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models [45.008146973701855]
N3D-VLM is a novel unified framework that seamlessly integrates native 3D object perception with 3D-aware visual reasoning.<n>Unlike conventional end-to-end models that directly predict answers from RGB/RGB-D inputs, our approach equips the model with native 3D object perception capabilities.
arXiv Detail & Related papers (2025-12-18T14:03:44Z)
3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection [62.57179069154312]
We introduce the first end-to-end 3D Monocular Open-set Object Detector (3D-MOOD)<n>We lift the open-set 2D detection into 3D space through our designed 3D bounding box head.<n>We condition the object queries with geometry prior and overcome the generalization for 3D estimation across diverse scenes.
arXiv Detail & Related papers (2025-07-31T13:56:41Z)
Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D [68.23391872643268]
LOCATE 3D is a model for localizing objects in 3D scenes from referring expressions like "the small coffee table between the sofa and the lamp"<n>It operates directly on sensor observation streams (posed RGB-D frames), enabling real-world deployment on robots and AR devices.
arXiv Detail & Related papers (2025-04-19T02:51:24Z)
3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment [44.00343134325925]
3D-VisTA is a pre-trained Transformer for 3D Vision and Text Alignment. ScanScribe is the first large-scale 3D scene-text pairs dataset for 3D-VL pre-training.
arXiv Detail & Related papers (2023-08-08T15:59:17Z)
MobileBrick: Building LEGO for 3D Reconstruction on Mobile Devices [78.20154723650333]
High-quality 3D ground-truth shapes are critical for 3D object reconstruction evaluation. We introduce a novel multi-view RGBD dataset captured using a mobile device. We obtain precise 3D ground-truth shape without relying on high-end 3D scanners.
arXiv Detail & Related papers (2023-03-03T14:02:50Z)
OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation [107.71752592196138]
We propose OmniObject3D, a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects. It comprises 6,000 scanned objects in 190 daily categories, sharing common classes with popular 2D datasets. Each 3D object is captured with both 2D and 3D sensors, providing textured meshes, point clouds, multiview rendered images, and multiple real-captured videos.
arXiv Detail & Related papers (2023-01-18T18:14:18Z)
Aerial Monocular 3D Object Detection [67.20369963664314]
DVDET is proposed to achieve aerial monocular 3D object detection in both the 2D image space and the 3D physical space.<n>To address the severe view deformation issue, we propose a novel trainable geo-deformable transformation module.<n>To encourage more researchers to investigate this area, we will release the dataset and related code.
arXiv Detail & Related papers (2022-08-08T08:32:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.