OccuFly: A 3D Vision Benchmark for Semantic Scene Completion from the Aerial Perspective
- URL: http://arxiv.org/abs/2512.20770v1
- Date: Tue, 23 Dec 2025 21:14:55 GMT
- Title: OccuFly: A 3D Vision Benchmark for Semantic Scene Completion from the Aerial Perspective
- Authors: Markus Gross, Sai B. Matha, Aya Fahmy, Rui Song, Daniel Cremers, Henri Meess,
- Abstract summary: OccuFly is the first real-world, camera-based aerial semantic scene completion benchmark, captured at altitudes of 50m, 40m, and 30m.<n>We propose a LiDAR-free data generation framework based on camera modality, which is ubiquitous on modern UAVs.<n>We benchmark the state-of-the-art on OccuFly and highlight challenges specific to elevated viewpoints.
- Score: 44.84496929237721
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Semantic Scene Completion (SSC) is crucial for 3D perception in mobile robotics, as it enables holistic scene understanding by jointly estimating dense volumetric occupancy and per-voxel semantics. Although SSC has been widely studied in terrestrial domains such as autonomous driving, aerial scenarios like autonomous flying remain largely unexplored, thereby limiting progress on downstream applications. Furthermore, LiDAR sensors represent the primary modality for SSC data generation, which poses challenges for most uncrewed aerial vehicles (UAVs) due to flight regulations, mass and energy constraints, and the sparsity of LiDAR-based point clouds from elevated viewpoints. To address these limitations, we introduce OccuFly, the first real-world, camera-based aerial SSC benchmark, captured at altitudes of 50m, 40m, and 30m during spring, summer, fall, and winter. OccuFly covers urban, industrial, and rural scenarios, provides 22 semantic classes, and the data format adheres to established conventions to facilitate seamless integration with existing research. Crucially, we propose a LiDAR-free data generation framework based on camera modality, which is ubiquitous on modern UAVs. By utilizing traditional 3D reconstruction, our framework automates label transfer by lifting a subset of annotated 2D masks into the reconstructed point cloud, thereby substantially minimizing manual 3D annotation effort. Finally, we benchmark the state-of-the-art on OccuFly and highlight challenges specific to elevated viewpoints, yielding a comprehensive vision benchmark for holistic aerial 3D scene understanding.
Related papers
- VLMFusionOcc3D: VLM Assisted Multi-Modal 3D Semantic Occupancy Prediction [0.0]
VLMFusionOcc3D is a robust multimodal framework for dense 3D semantic occupancy prediction in autonomous driving.<n>We introduce Weather-Aware Adaptive Fusion, a dynamic gating mechanism that utilizes vehicle metadata and weather-conditioned prompts to re-weight sensor contributions.<n>Our approach achieves significant improvements in challenging weather scenarios, offering a scalable and robust solution for complex urban navigation.
arXiv Detail & Related papers (2026-03-03T05:22:28Z) - UAV-MM3D: A Large-Scale Synthetic Benchmark for 3D Perception of Unmanned Aerial Vehicles with Multi-Modal Data [47.317955428393134]
We introduce UAV-MM3D, a high-fidelity multimodal synthetic dataset for low-altitude UAV perception and motion understanding.<n>It comprises 400K synchronized frames across diverse scenes (urban areas, suburbs, forests, coastal regions) and weather conditions.<n>Each frame provides 2D/3D bounding boxes, 6-DoF poses, and instance-level annotations, enabling core tasks related to UAVs such as 3D detection, pose estimation, target tracking, and short-term trajectory forecasting.
arXiv Detail & Related papers (2025-11-27T12:30:28Z) - ShelfOcc: Native 3D Supervision beyond LiDAR for Vision-Based Occupancy Estimation [9.977834471775816]
We introduce ShelfOcc, a vision-only method that overcomes limitations without relying on LiDAR.<n> ShelfOcc brings supervision into native 3D space by generating metrically consistent semantic voxel labels from video.<n>Our method introduces a dedicated framework that mitigates these issues by filtering and accumulating static geometry consistently across frames.
arXiv Detail & Related papers (2025-11-19T12:44:13Z) - Neural 3D Object Reconstruction with Small-Scale Unmanned Aerial Vehicles [16.745245388756533]
Small Unmanned Aerial Vehicles (UAVs) exhibit immense potential for navigating indoor and hard-to-reach areas.<n>We introduce a novel system architecture that enables fully autonomous, high-fidelity 3D scanning of static objects using UAVs weighing under 100 grams.
arXiv Detail & Related papers (2025-09-15T21:08:32Z) - NOVA: Navigation via Object-Centric Visual Autonomy for High-Speed Target Tracking in Unstructured GPS-Denied Environments [56.35569661650558]
We introduce NOVA, a fully onboard, object-centric framework that enables robust target tracking and collision-aware navigation.<n>Rather than constructing a global map, NOVA formulates perception, estimation, and control entirely in the target's reference frame.<n>We validate NOVA across challenging real-world scenarios, including urban mazes, forest trails, and repeated transitions through buildings with intermittent GPS loss.
arXiv Detail & Related papers (2025-06-23T14:28:30Z) - S3MOT: Monocular 3D Object Tracking with Selective State Space Model [3.5047603107971397]
Multi-object tracking in 3D space is essential for advancing robotics and computer applications.<n>It remains a significant challenge in monocular setups due to the difficulty of mining 3D associations from 2D video streams.<n>We present three innovative techniques to enhance the fusion of heterogeneous cues for monocular 3D MOT.
arXiv Detail & Related papers (2025-04-25T04:45:35Z) - OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision.
We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range.
For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z) - SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving [87.8761593366609]
SSCBench is a benchmark that integrates scenes from widely used automotive datasets.
We benchmark models using monocular, trinocular, and cloud input to assess the performance gap.
We have unified semantic labels across diverse datasets to simplify cross-domain generalization testing.
arXiv Detail & Related papers (2023-06-15T09:56:33Z) - VPAIR -- Aerial Visual Place Recognition and Localization in Large-scale
Outdoor Environments [49.82314641876602]
We present a new dataset named VPAIR.
The dataset was recorded on board a light aircraft flying at an altitude of more than 300 meters above ground.
The dataset covers a more than one hundred kilometers long trajectory over various types of challenging landscapes.
arXiv Detail & Related papers (2022-05-23T18:50:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.