Related papers: MCOP: Multi-UAV Collaborative Occupancy Prediction

MCOP: Multi-UAV Collaborative Occupancy Prediction

URL: http://arxiv.org/abs/2510.12679v2
Date: Wed, 15 Oct 2025 01:11:34 GMT
Title: MCOP: Multi-UAV Collaborative Occupancy Prediction
Authors: Zefu Lin, Wenbo Chen, Xiaojuan Jin, Yuran Yang, Lue Fan, Yixin Zhang, Yufeng Zhang, Zhaoxiang Zhang,
Abstract summary: Current Bird's Eye View (BEV)-based approaches exhibit two main limitations.<n>We propose a novel multi-UAV collaborative occupancy prediction framework.<n>Our method achieves state-of-the-art accuracy, significantly outperforming existing collaborative methods.
Score: 40.58729551462363
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Unmanned Aerial Vehicle (UAV) swarm systems necessitate efficient collaborative perception mechanisms for diverse operational scenarios. Current Bird's Eye View (BEV)-based approaches exhibit two main limitations: bounding-box representations fail to capture complete semantic and geometric information of the scene, and their performance significantly degrades when encountering undefined or occluded objects. To address these limitations, we propose a novel multi-UAV collaborative occupancy prediction framework. Our framework effectively preserves 3D spatial structures and semantics through integrating a Spatial-Aware Feature Encoder and Cross-Agent Feature Integration. To enhance efficiency, we further introduce Altitude-Aware Feature Reduction to compactly represent scene information, along with a Dual-Mask Perceptual Guidance mechanism to adaptively select features and reduce communication overhead. Due to the absence of suitable benchmark datasets, we extend three datasets for evaluation: two virtual datasets (Air-to-Pred-Occ and UAV3D-Occ) and one real-world dataset (GauUScene-Occ). Experiments results demonstrate that our method achieves state-of-the-art accuracy, significantly outperforming existing collaborative methods while reducing communication overhead to only a fraction of previous approaches.

Related papers

A Tri-Modal Dataset and a Baseline System for Tracking Unmanned Aerial Vehicles [74.8162337823142]
MM-UAV is the first large-scale benchmark for Multi-Modal UAV Tracking.<n>The dataset spans over 30 challenging scenarios, with 1,321 synchronised multi-modal sequences, and more than 2.8 million annotated frames.<n>Accompanying the dataset, we provide a novel multi-modal multi-UAV tracking framework.
arXiv Detail & Related papers (2025-11-23T08:42:17Z)
INSTINCT: Instance-Level Interaction Architecture for Query-Based Collaborative Perception [6.018757656052237]
Collaborative perception systems overcome single-vehicle limitations by integrating multi-agent sensory data, improving accuracy and safety.<n>Previous works proves that query-based instance-level interaction reduces bandwidth demands and manual priors, however, LiDAR-focused implementations in collaborative perception remain underdeveloped.<n>We propose INSTINCT, a novel collaborative perception framework featuring three core components: 1) a quality-aware filtering mechanism for high-quality instance feature selection; 2) a dual-branch detection routing scheme to decouple collaboration-irrelevant and collaboration-relevant instances; and 3) a Cross Agent Local Instance Fusion module to aggregate local hybrid instance features.
arXiv Detail & Related papers (2025-09-28T07:16:32Z)
Vision-Only Gaussian Splatting for Collaborative Semantic Occupancy Prediction [12.80732853899807]
Collaborative perception enables connected vehicles to share information.<n>Existing vision-only methods for 3D semantic occupancy prediction commonly rely on dense 3D voxels.<n>We propose the first approach leveraging sparse 3D semantic Gaussian splatting for collaborative 3D semantic occupancy prediction.
arXiv Detail & Related papers (2025-08-12T19:50:34Z)
Mitigating Trade-off: Stream and Query-guided Aggregation for Efficient and Effective 3D Occupancy Prediction [12.064509280163502]
3D occupancy prediction has emerged as a key perception task for autonomous driving.<n>Recent studies focus on integrating information obtained from past observations to improve prediction accuracy.<n>We propose StreamOcc, a framework that aggregates past-temporal information in a stream-based manner.<n>Experiments on the Occ3D-nus dataset show that StreamOcc achieves state-of-the-art performance in real-time settings, while reducing memory usage by more than 50% compared to previous methods.
arXiv Detail & Related papers (2025-03-28T02:05:53Z)
Which2comm: An Efficient Collaborative Perception Framework for 3D Object Detection [5.195291754828701]
Collaborative perception allows real-time inter-agent information exchange.<n>limited communication bandwidth in practical scenarios restricts the inter-agent data transmission volume.<n>We propose Which2comm, a novel multi-agent 3D object detection framework leveraging object-level sparse features.
arXiv Detail & Related papers (2025-03-21T14:24:07Z)
Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark [15.405137983083875]
Aerial-ground cooperation offers a promising solution by integrating UAVs' aerial views with ground vehicles' local observations.<n>This paper presents a comprehensive solution for aerial-ground cooperative 3D perception through three key contributions.
arXiv Detail & Related papers (2025-03-10T07:00:07Z)
ALOcc: Adaptive Lifting-Based 3D Semantic Occupancy and Cost Volume-Based Flow Predictions [91.55655961014027]
3D semantic occupancy and flow prediction are fundamental to understanding scene scene.<n>This paper proposes a vision-based framework with three targeted improvements.<n>Our purely convolutional architecture establishes new SOTA performance on multiple benchmarks for both semantic occupancy and joint semantic-flow prediction.
arXiv Detail & Related papers (2024-11-12T11:32:56Z)
Towards Unified 3D Object Detection via Algorithm and Data Unification [70.27631528933482]
We build the first unified multi-modal 3D object detection benchmark MM- Omni3D and extend the aforementioned monocular detector to its multi-modal version. We name the designed monocular and multi-modal detectors as UniMODE and MM-UniMODE, respectively.
arXiv Detail & Related papers (2024-02-28T18:59:31Z)
S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR) Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection. In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z)
DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention [50.11672196146829]
3D object detection with surround-view images is an essential task for autonomous driving. We propose DETR4D, a Transformer-based framework that explores sparse attention and direct feature query for 3D object detection in multi-view images.
arXiv Detail & Related papers (2022-12-15T14:18:47Z)
Dual Adversarial Auto-Encoders for Clustering [152.84443014554745]
We propose Dual Adversarial Auto-encoder (Dual-AAE) for unsupervised clustering. By performing variational inference on the objective function of Dual-AAE, we derive a new reconstruction loss which can be optimized by training a pair of Auto-encoders. Experiments on four benchmarks show that Dual-AAE achieves superior performance over state-of-the-art clustering methods.
arXiv Detail & Related papers (2020-08-23T13:16:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.