Related papers: AGC-Drive: A Large-Scale Dataset for Real-World Aerial-Ground Collaboration in Driving Scenarios

AGC-Drive: A Large-Scale Dataset for Real-World Aerial-Ground Collaboration in Driving Scenarios

URL: http://arxiv.org/abs/2506.16371v1
Date: Thu, 19 Jun 2025 14:48:43 GMT
Title: AGC-Drive: A Large-Scale Dataset for Real-World Aerial-Ground Collaboration in Driving Scenarios
Authors: Yunhao Hou, Bochao Zou, Min Zhang, Ran Chen, Shangdong Yang, Yanmei Zhang, Junbao Zhuo, Siheng Chen, Jiansheng Chen, Huimin Ma,
Abstract summary: We present AGC-Drive, the first large-scale real-world dataset for Aerial-Ground Cooperative 3D perception.<n>The dataset covers 14 diverse real-world driving scenarios, including urban roundabouts, highway tunnels, and on/off ramps.<n>We provide benchmarks for two 3D perception tasks: vehicle-to-vehicle collaborative perception and vehicle-to-infrastructure collaborative perception.
Score: 45.225172917280254
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: By sharing information across multiple agents, collaborative perception helps autonomous vehicles mitigate occlusions and improve overall perception accuracy. While most previous work focus on vehicle-to-vehicle and vehicle-to-infrastructure collaboration, with limited attention to aerial perspectives provided by UAVs, which uniquely offer dynamic, top-down views to alleviate occlusions and monitor large-scale interactive environments. A major reason for this is the lack of high-quality datasets for aerial-ground collaborative scenarios. To bridge this gap, we present AGC-Drive, the first large-scale real-world dataset for Aerial-Ground Cooperative 3D perception. The data collection platform consists of two vehicles, each equipped with five cameras and one LiDAR sensor, and one UAV carrying a forward-facing camera and a LiDAR sensor, enabling comprehensive multi-view and multi-agent perception. Consisting of approximately 120K LiDAR frames and 440K images, the dataset covers 14 diverse real-world driving scenarios, including urban roundabouts, highway tunnels, and on/off ramps. Notably, 19.5% of the data comprises dynamic interaction events, including vehicle cut-ins, cut-outs, and frequent lane changes. AGC-Drive contains 400 scenes, each with approximately 100 frames and fully annotated 3D bounding boxes covering 13 object categories. We provide benchmarks for two 3D perception tasks: vehicle-to-vehicle collaborative perception and vehicle-to-UAV collaborative perception. Additionally, we release an open-source toolkit, including spatiotemporal alignment verification tools, multi-agent visualization systems, and collaborative annotation utilities. The dataset and code are available at https://github.com/PercepX/AGC-Drive.

Related papers

CATS-V2V: A Real-World Vehicle-to-Vehicle Cooperative Perception Dataset with Complex Adverse Traffic Scenarios [25.181569981722976]
We introduce CATS-V2V, the first-of-its-kind real-world dataset for V2V cooperative perception under complex adverse traffic scenarios.<n>The 100-clip dataset includes 60K frames of 10 Hz LiDAR point clouds and 1.26M multi-view 30 Hz camera images.<n>On this basis, we propose a target-based temporal alignment method, ensuring that all objects are precisely aligned across all sensor modalities.
arXiv Detail & Related papers (2025-11-14T11:07:04Z)
UrbanIng-V2X: A Large-Scale Multi-Vehicle, Multi-Infrastructure Dataset Across Multiple Intersections for Cooperative Perception [45.81880967650386]
UrbanIng-V2X consists of 34 temporally aligned and spatially calibrated sensor sequences lasting 20 seconds.<n>All sequences are annotated at a frequency of 10 Hz with 3D bounding boxes spanning 13 object classes.<n>In total, UrbanIng-V2X provides data from 12 vehicle-mounted RGB cameras, 2 vehicle LiDARs, 17 infrastructure thermal cameras, and 12 infrastructure LiDARs.
arXiv Detail & Related papers (2025-10-27T16:12:12Z)
V2X-Radar: A Multi-modal Dataset with 4D Radar for Cooperative Perception [47.55064735186109]
We present V2X-Radar, the first large-scale, real-world multi-modal dataset featuring 4D Radar.<n>The dataset consists of 20K LiDAR frames, 40K camera images, and 20K 4D Radar data, including 350K annotated boxes across five categories.<n>To support various research domains, we have established V2X-Radar-C for cooperative perception, V2X-Radar-I for roadside perception, and V2X-Radar-V for single-vehicle perception.
arXiv Detail & Related papers (2024-11-17T04:59:00Z)
RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments [62.5830455357187]
We setup an egocentric multi-sensor data collection platform based on 3 main types of sensors (Camera, LiDAR and Fisheye)<n>A large-scale multimodal dataset is constructed, named RoboSense, to facilitate egocentric robot perception.
arXiv Detail & Related papers (2024-08-28T03:17:40Z)
InScope: A New Real-world 3D Infrastructure-side Collaborative Perception Dataset for Open Traffic Scenarios [13.821143687548494]
This paper introduces a new 3D infrastructure-side collaborative perception dataset, abbreviated as inscope. InScope encapsulates a 20-day capture duration with 303 tracking trajectories and 187,787 3D bounding boxes annotated by experts.
arXiv Detail & Related papers (2024-07-31T13:11:14Z)
V2X-Real: a Large-Scale Dataset for Vehicle-to-Everything Cooperative Perception [22.3955949838171]
We present V2X-Real, a large-scale dataset that includes a mixture of multiple vehicles and smart infrastructure.<n>Our dataset contains 33K LiDAR frames and 171K camera data with over 1.2M annotated bounding boxes of 10 categories in very challenging urban scenarios.
arXiv Detail & Related papers (2024-03-24T06:30:02Z)
HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative [23.293162454592544]
We constructed holographic intersections with various layouts to build a large-scale multi-sensor holographic vehicle-infrastructure cooperation dataset, called HoloVIC. Our dataset includes 3 different types of sensors (Camera, Lidar, Fisheye) and employs 4 sensor-s based on the different intersections.
arXiv Detail & Related papers (2024-03-05T04:08:19Z)
V2V4Real: A Real-world Large-scale Dataset for Vehicle-to-Vehicle Cooperative Perception [49.7212681947463]
Vehicle-to-Vehicle (V2V) cooperative perception system has great potential to revolutionize the autonomous driving industry. We present V2V4Real, the first large-scale real-world multi-modal dataset for V2V perception. Our dataset covers a driving area of 410 km, comprising 20K LiDAR frames, 40K RGB frames, 240K annotated 3D bounding boxes for 5 classes, and HDMaps.
arXiv Detail & Related papers (2023-03-14T02:49:20Z)
Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting [64.7364925689825]
Argoverse 2 (AV2) is a collection of three datasets for perception and forecasting research in the self-driving domain. The Lidar dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. The Motion Forecasting dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene.
arXiv Detail & Related papers (2023-01-02T00:36:22Z)
OPV2V: An Open Benchmark Dataset and Fusion Pipeline for Perception with Vehicle-to-Vehicle Communication [13.633468133727]
We present the first large-scale open simulated dataset for Vehicle-to-Vehicle perception. It contains over 70 interesting scenes, 11,464 frames, and 232,913 annotated 3D vehicle bounding boxes.
arXiv Detail & Related papers (2021-09-16T00:52:41Z)
Large Scale Interactive Motion Forecasting for Autonomous Driving : The Waymo Open Motion Dataset [84.3946567650148]
With over 100,000 scenes, each 20 seconds long at 10 Hz, our new dataset contains more than 570 hours of unique data over 1750 km of roadways. We use a high-accuracy 3D auto-labeling system to generate high quality 3D bounding boxes for each road agent. We introduce a new set of metrics that provides a comprehensive evaluation of both single agent and joint agent interaction motion forecasting models.
arXiv Detail & Related papers (2021-04-20T17:19:05Z)
Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images. Our approach is fully automatic without any human interaction. We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.