MCOP: Multi-UAV Collaborative Occupancy Prediction
- URL: http://arxiv.org/abs/2510.12679v2
- Date: Wed, 15 Oct 2025 01:11:34 GMT
- Title: MCOP: Multi-UAV Collaborative Occupancy Prediction
- Authors: Zefu Lin, Wenbo Chen, Xiaojuan Jin, Yuran Yang, Lue Fan, Yixin Zhang, Yufeng Zhang, Zhaoxiang Zhang,
- Abstract summary: Current Bird's Eye View (BEV)-based approaches exhibit two main limitations.<n>We propose a novel multi-UAV collaborative occupancy prediction framework.<n>Our method achieves state-of-the-art accuracy, significantly outperforming existing collaborative methods.
- Score: 40.58729551462363
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unmanned Aerial Vehicle (UAV) swarm systems necessitate efficient collaborative perception mechanisms for diverse operational scenarios. Current Bird's Eye View (BEV)-based approaches exhibit two main limitations: bounding-box representations fail to capture complete semantic and geometric information of the scene, and their performance significantly degrades when encountering undefined or occluded objects. To address these limitations, we propose a novel multi-UAV collaborative occupancy prediction framework. Our framework effectively preserves 3D spatial structures and semantics through integrating a Spatial-Aware Feature Encoder and Cross-Agent Feature Integration. To enhance efficiency, we further introduce Altitude-Aware Feature Reduction to compactly represent scene information, along with a Dual-Mask Perceptual Guidance mechanism to adaptively select features and reduce communication overhead. Due to the absence of suitable benchmark datasets, we extend three datasets for evaluation: two virtual datasets (Air-to-Pred-Occ and UAV3D-Occ) and one real-world dataset (GauUScene-Occ). Experiments results demonstrate that our method achieves state-of-the-art accuracy, significantly outperforming existing collaborative methods while reducing communication overhead to only a fraction of previous approaches.
Related papers
- A Tri-Modal Dataset and a Baseline System for Tracking Unmanned Aerial Vehicles [74.8162337823142]
MM-UAV is the first large-scale benchmark for Multi-Modal UAV Tracking.<n>The dataset spans over 30 challenging scenarios, with 1,321 synchronised multi-modal sequences, and more than 2.8 million annotated frames.<n>Accompanying the dataset, we provide a novel multi-modal multi-UAV tracking framework.
arXiv Detail & Related papers (2025-11-23T08:42:17Z) - INSTINCT: Instance-Level Interaction Architecture for Query-Based Collaborative Perception [6.018757656052237]
Collaborative perception systems overcome single-vehicle limitations by integrating multi-agent sensory data, improving accuracy and safety.<n>Previous works proves that query-based instance-level interaction reduces bandwidth demands and manual priors, however, LiDAR-focused implementations in collaborative perception remain underdeveloped.<n>We propose INSTINCT, a novel collaborative perception framework featuring three core components: 1) a quality-aware filtering mechanism for high-quality instance feature selection; 2) a dual-branch detection routing scheme to decouple collaboration-irrelevant and collaboration-relevant instances; and 3) a Cross Agent Local Instance Fusion module to aggregate local hybrid instance features.
arXiv Detail & Related papers (2025-09-28T07:16:32Z) - Vision-Only Gaussian Splatting for Collaborative Semantic Occupancy Prediction [12.80732853899807]
Collaborative perception enables connected vehicles to share information.<n>Existing vision-only methods for 3D semantic occupancy prediction commonly rely on dense 3D voxels.<n>We propose the first approach leveraging sparse 3D semantic Gaussian splatting for collaborative 3D semantic occupancy prediction.
arXiv Detail & Related papers (2025-08-12T19:50:34Z) - Mitigating Trade-off: Stream and Query-guided Aggregation for Efficient and Effective 3D Occupancy Prediction [12.064509280163502]
3D occupancy prediction has emerged as a key perception task for autonomous driving.<n>Recent studies focus on integrating information obtained from past observations to improve prediction accuracy.<n>We propose StreamOcc, a framework that aggregates past-temporal information in a stream-based manner.<n>Experiments on the Occ3D-nus dataset show that StreamOcc achieves state-of-the-art performance in real-time settings, while reducing memory usage by more than 50% compared to previous methods.
arXiv Detail & Related papers (2025-03-28T02:05:53Z) - Which2comm: An Efficient Collaborative Perception Framework for 3D Object Detection [5.195291754828701]
Collaborative perception allows real-time inter-agent information exchange.<n>limited communication bandwidth in practical scenarios restricts the inter-agent data transmission volume.<n>We propose Which2comm, a novel multi-agent 3D object detection framework leveraging object-level sparse features.
arXiv Detail & Related papers (2025-03-21T14:24:07Z) - Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark [15.405137983083875]
Aerial-ground cooperation offers a promising solution by integrating UAVs' aerial views with ground vehicles' local observations.<n>This paper presents a comprehensive solution for aerial-ground cooperative 3D perception through three key contributions.
arXiv Detail & Related papers (2025-03-10T07:00:07Z) - ALOcc: Adaptive Lifting-Based 3D Semantic Occupancy and Cost Volume-Based Flow Predictions [91.55655961014027]
3D semantic occupancy and flow prediction are fundamental to understanding scene scene.<n>This paper proposes a vision-based framework with three targeted improvements.<n>Our purely convolutional architecture establishes new SOTA performance on multiple benchmarks for both semantic occupancy and joint semantic-flow prediction.
arXiv Detail & Related papers (2024-11-12T11:32:56Z) - Towards Unified 3D Object Detection via Algorithm and Data Unification [70.27631528933482]
We build the first unified multi-modal 3D object detection benchmark MM- Omni3D and extend the aforementioned monocular detector to its multi-modal version.
We name the designed monocular and multi-modal detectors as UniMODE and MM-UniMODE, respectively.
arXiv Detail & Related papers (2024-02-28T18:59:31Z) - S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR)
Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection.
In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z) - DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention [50.11672196146829]
3D object detection with surround-view images is an essential task for autonomous driving.
We propose DETR4D, a Transformer-based framework that explores sparse attention and direct feature query for 3D object detection in multi-view images.
arXiv Detail & Related papers (2022-12-15T14:18:47Z) - Dual Adversarial Auto-Encoders for Clustering [152.84443014554745]
We propose Dual Adversarial Auto-encoder (Dual-AAE) for unsupervised clustering.
By performing variational inference on the objective function of Dual-AAE, we derive a new reconstruction loss which can be optimized by training a pair of Auto-encoders.
Experiments on four benchmarks show that Dual-AAE achieves superior performance over state-of-the-art clustering methods.
arXiv Detail & Related papers (2020-08-23T13:16:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.