Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark
- URL: http://arxiv.org/abs/2503.06983v1
- Date: Mon, 10 Mar 2025 07:00:07 GMT
- Title: Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark
- Authors: Jiahao Wang, Xiangyu Cao, Jiaru Zhong, Yuner Zhang, Haibao Yu, Lei He, Shaobing Xu,
- Abstract summary: Aerial-ground cooperation offers a promising solution by integrating UAVs' aerial views with ground vehicles' local observations.<n>This paper presents a comprehensive solution for aerial-ground cooperative 3D perception through three key contributions.
- Score: 15.405137983083875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite significant advancements, autonomous driving systems continue to struggle with occluded objects and long-range detection due to the inherent limitations of single-perspective sensing. Aerial-ground cooperation offers a promising solution by integrating UAVs' aerial views with ground vehicles' local observations. However, progress in this emerging field has been hindered by the absence of public datasets and standardized evaluation benchmarks. To address this gap, this paper presents a comprehensive solution for aerial-ground cooperative 3D perception through three key contributions: (1) Griffin, a large-scale multi-modal dataset featuring over 200 dynamic scenes (30k+ frames) with varied UAV altitudes (20-60m), diverse weather conditions, and occlusion-aware 3D annotations, enhanced by CARLA-AirSim co-simulation for realistic UAV dynamics; (2) A unified benchmarking framework for aerial-ground cooperative detection and tracking tasks, including protocols for evaluating communication efficiency, latency tolerance, and altitude adaptability; (3) AGILE, an instance-level intermediate fusion baseline that dynamically aligns cross-view features through query-based interaction, achieving an advantageous balance between communication overhead and perception accuracy. Extensive experiments prove the effectiveness of aerial-ground cooperative perception and demonstrate the direction of further research. The dataset and codes are available at https://github.com/wang-jh18-SVM/Griffin.
Related papers
- More Clear, More Flexible, More Precise: A Comprehensive Oriented Object Detection benchmark for UAV [58.89234732689013]
CODrone is a comprehensive oriented object detection dataset for UAVs that accurately reflects real-world conditions.
It also serves as a new benchmark designed to align with downstream task requirements.
We conduct a series of experiments based on 22 classical or SOTA methods to rigorously evaluate CODrone.
arXiv Detail & Related papers (2025-04-28T17:56:02Z) - SparseAlign: A Fully Sparse Framework for Cooperative Object Detection [38.96043178218958]
We design a fully sparse framework, SparseAlign, with three key features: an enhanced sparse 3D backbone, a query-based temporal context learning module, and a robust detection head specially tailored for sparse features.
Our framework, despite its sparsity, outperforms the state of the art with less communication bandwidth requirements.
arXiv Detail & Related papers (2025-03-17T09:38:53Z) - Target-aware Bidirectional Fusion Transformer for Aerial Object Tracking [4.199091332200661]
We propose a novel target-aware Bidirectional Fusion transformer (BFTrans) for UAV tracking.
Our approach can exceed other state-of-the-art trackers and run with an average speed of 30.5 FPS on embedded platform.
arXiv Detail & Related papers (2025-03-13T01:53:29Z) - A Cross-Scene Benchmark for Open-World Drone Active Tracking [54.235808061746525]
Drone Visual Active Tracking aims to autonomously follow a target object by controlling the motion system based on visual observations.<n>We propose a unified cross-scene cross-domain benchmark for open-world drone active tracking called DAT.<n>We also propose a reinforcement learning-based drone tracking method called R-VAT.
arXiv Detail & Related papers (2024-12-01T09:37:46Z) - Clustering-based Learning for UAV Tracking and Pose Estimation [0.0]
This work develops a clustering-based learning detection approach, CL-Det, for UAV tracking and pose estimation using two types of LiDARs.
We first align the timestamps of Livox Avia data and LiDAR 360 data and then separate the point cloud of objects of interest (OOIs) from the environment.
The proposed method shows competitive pose estimation performance and ranks 5th on the final leaderboard of the CVPR 2024 UG2+ Challenge.
arXiv Detail & Related papers (2024-05-27T06:33:25Z) - Ret3D: Rethinking Object Relations for Efficient 3D Object Detection in
Driving Scenes [82.4186966781934]
We introduce a simple, efficient, and effective two-stage detector, termed as Ret3D.
At the core of Ret3D is the utilization of novel intra-frame and inter-frame relation modules.
With negligible extra overhead, Ret3D achieves the state-of-the-art performance.
arXiv Detail & Related papers (2022-08-18T03:48:58Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - Disentangle Your Dense Object Detector [82.22771433419727]
Deep learning-based dense object detectors have achieved great success in the past few years and have been applied to numerous multimedia applications such as video understanding.
However, the current training pipeline for dense detectors is compromised to lots of conjunctions that may not hold.
We propose Disentangled Dense Object Detector (DDOD), in which simple and effective disentanglement mechanisms are designed and integrated into the current state-of-the-art detectors.
arXiv Detail & Related papers (2021-07-07T00:52:16Z) - Higher Performance Visual Tracking with Dual-Modal Localization [106.91097443275035]
Visual Object Tracking (VOT) has synchronous needs for both robustness and accuracy.
We propose a dual-modal framework for target localization, consisting of robust localization suppressingors via ONR and the accurate localization attending to the target center precisely via OFC.
arXiv Detail & Related papers (2021-03-18T08:47:56Z) - A Vision Based Deep Reinforcement Learning Algorithm for UAV Obstacle
Avoidance [1.2693545159861856]
We present two techniques for improving exploration for UAV obstacle avoidance.
The first is a convergence-based approach that uses convergence error to iterate through unexplored actions and temporal threshold to balance exploration and exploitation.
The second is a guidance-based approach which uses a Gaussian mixture distribution to compare previously seen states to a predicted next state in order to select the next action.
arXiv Detail & Related papers (2021-03-11T01:15:26Z) - Correlation Filters for Unmanned Aerial Vehicle-Based Aerial Tracking: A
Review and Experimental Evaluation [17.8941834997338]
Discriminative correlation filter (DCF)-based trackers have stood out for their high computational efficiency and robustness on a single CPU.
In this work, 23 state-of-the-art DCF-based trackers are summarized according to their innovations for solving various issues.
Experiments show the performance, verify the feasibility, and demonstrate the current challenges of DCF-based trackers onboard UAV tracking.
arXiv Detail & Related papers (2020-10-13T09:35:40Z) - Distributed Variable-Baseline Stereo SLAM from two UAVs [17.513645771137178]
In this article, we employ two UAVs equipped with one monocular camera and one IMU each, to exploit their view overlap and relative distance measurements.
In order to control the glsuav agents autonomously, we propose a decentralized collaborative estimation scheme.
We demonstrate the effectiveness of the approach at high altitude flights of up to 160m, going significantly beyond the capabilities of state-of-the-art VIO methods.
arXiv Detail & Related papers (2020-09-10T12:16:10Z) - Autonomous and cooperative design of the monitor positions for a team of
UAVs to maximize the quantity and quality of detected objects [0.5801044612920815]
This paper tackles the problem of positioning a swarm of UAVs inside a completely unknown terrain.
YOLOv3 and a system to identify duplicate objects of interest were employed to assign a single score to each UAVs' configuration.
A novel navigation algorithm, capable of optimizing the previously defined score, is proposed.
arXiv Detail & Related papers (2020-07-02T16:52:57Z) - EHSOD: CAM-Guided End-to-end Hybrid-Supervised Object Detection with
Cascade Refinement [53.69674636044927]
We present EHSOD, an end-to-end hybrid-supervised object detection system.
It can be trained in one shot on both fully and weakly-annotated data.
It achieves comparable results on multiple object detection benchmarks with only 30% fully-annotated data.
arXiv Detail & Related papers (2020-02-18T08:04:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.