CATS-V2V: A Real-World Vehicle-to-Vehicle Cooperative Perception Dataset with Complex Adverse Traffic Scenarios
- URL: http://arxiv.org/abs/2511.11168v1
- Date: Fri, 14 Nov 2025 11:07:04 GMT
- Title: CATS-V2V: A Real-World Vehicle-to-Vehicle Cooperative Perception Dataset with Complex Adverse Traffic Scenarios
- Authors: Hangyu Li, Bofeng Cao, Zhaohui Liang, Wuzhen Li, Juyoung Oh, Yuxuan Chen, Shixiao Liang, Hang Zhou, Chengyuan Ma, Jiaxi Liu, Zheng Li, Peng Zhang, KeKe Long, Maolin Liu, Jackson Jiang, Chunlei Yu, Shengxiang Liu, Hongkai Yu, Xiaopeng Li,
- Abstract summary: We introduce CATS-V2V, the first-of-its-kind real-world dataset for V2V cooperative perception under complex adverse traffic scenarios.<n>The 100-clip dataset includes 60K frames of 10 Hz LiDAR point clouds and 1.26M multi-view 30 Hz camera images.<n>On this basis, we propose a target-based temporal alignment method, ensuring that all objects are precisely aligned across all sensor modalities.
- Score: 25.181569981722976
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vehicle-to-Vehicle (V2V) cooperative perception has great potential to enhance autonomous driving performance by overcoming perception limitations in complex adverse traffic scenarios (CATS). Meanwhile, data serves as the fundamental infrastructure for modern autonomous driving AI. However, due to stringent data collection requirements, existing datasets focus primarily on ordinary traffic scenarios, constraining the benefits of cooperative perception. To address this challenge, we introduce CATS-V2V, the first-of-its-kind real-world dataset for V2V cooperative perception under complex adverse traffic scenarios. The dataset was collected by two hardware time-synchronized vehicles, covering 10 weather and lighting conditions across 10 diverse locations. The 100-clip dataset includes 60K frames of 10 Hz LiDAR point clouds and 1.26M multi-view 30 Hz camera images, along with 750K anonymized yet high-precision RTK-fixed GNSS and IMU records. Correspondingly, we provide time-consistent 3D bounding box annotations for objects, as well as static scenes to construct a 4D BEV representation. On this basis, we propose a target-based temporal alignment method, ensuring that all objects are precisely aligned across all sensor modalities. We hope that CATS-V2V, the largest-scale, most supportive, and highest-quality dataset of its kind to date, will benefit the autonomous driving community in related tasks.
Related papers
- UrbanIng-V2X: A Large-Scale Multi-Vehicle, Multi-Infrastructure Dataset Across Multiple Intersections for Cooperative Perception [45.81880967650386]
UrbanIng-V2X consists of 34 temporally aligned and spatially calibrated sensor sequences lasting 20 seconds.<n>All sequences are annotated at a frequency of 10 Hz with 3D bounding boxes spanning 13 object classes.<n>In total, UrbanIng-V2X provides data from 12 vehicle-mounted RGB cameras, 2 vehicle LiDARs, 17 infrastructure thermal cameras, and 12 infrastructure LiDARs.
arXiv Detail & Related papers (2025-10-27T16:12:12Z) - AGC-Drive: A Large-Scale Dataset for Real-World Aerial-Ground Collaboration in Driving Scenarios [68.84774511206797]
We present AGC-Drive, the first large-scale real-world dataset for Aerial-Ground Cooperative 3D perception.<n>AGC-Drive contains 350 scenes, each with approximately 100 frames and fully annotated 3D bounding boxes covering 13 object categories.<n>We provide benchmarks for two 3D perception tasks: vehicle-to-vehicle collaborative perception and vehicle-to-Ground collaborative perception.
arXiv Detail & Related papers (2025-06-19T14:48:43Z) - Mixed Signals: A Diverse Point Cloud Dataset for Heterogeneous LiDAR V2X Collaboration [57.3519952529079]
Vehicle-to-everything (V2X) collaborative perception has emerged as a promising solution to address the limitations of single-vehicle perception systems.<n>To address these gaps, we present Mixed Signals, a comprehensive V2X dataset featuring 45.1k point clouds and 240.6k bounding boxes.<n>Our dataset provides point clouds and bounding box annotations across 10 classes, ensuring reliable data for perception training.
arXiv Detail & Related papers (2025-02-19T23:53:00Z) - Multi-V2X: A Large Scale Multi-modal Multi-penetration-rate Dataset for Cooperative Perception [3.10770247120758]
We introduce Multi-V2X, a large-scale, multi-modal, multi-penetration-rate dataset for V2X perception.
In total, our Multi-V2X dataset comprises 549k RGB frames, 146k LiDAR frames, and 4,219k annotated 3D bounding boxes.
The highest possible CAV penetration rate reaches 86.21%, with up to 31 agents in communication range.
arXiv Detail & Related papers (2024-09-08T05:22:00Z) - RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments [62.5830455357187]
We setup an egocentric multi-sensor data collection platform based on 3 main types of sensors (Camera, LiDAR and Fisheye)<n>A large-scale multimodal dataset is constructed, named RoboSense, to facilitate egocentric robot perception.
arXiv Detail & Related papers (2024-08-28T03:17:40Z) - V2X-Real: a Large-Scale Dataset for Vehicle-to-Everything Cooperative Perception [22.3955949838171]
We present V2X-Real, a large-scale dataset that includes a mixture of multiple vehicles and smart infrastructure.<n>Our dataset contains 33K LiDAR frames and 171K camera data with over 1.2M annotated bounding boxes of 10 categories in very challenging urban scenarios.
arXiv Detail & Related papers (2024-03-24T06:30:02Z) - V2V4Real: A Real-world Large-scale Dataset for Vehicle-to-Vehicle
Cooperative Perception [49.7212681947463]
Vehicle-to-Vehicle (V2V) cooperative perception system has great potential to revolutionize the autonomous driving industry.
We present V2V4Real, the first large-scale real-world multi-modal dataset for V2V perception.
Our dataset covers a driving area of 410 km, comprising 20K LiDAR frames, 40K RGB frames, 240K annotated 3D bounding boxes for 5 classes, and HDMaps.
arXiv Detail & Related papers (2023-03-14T02:49:20Z) - Argoverse 2: Next Generation Datasets for Self-Driving Perception and
Forecasting [64.7364925689825]
Argoverse 2 (AV2) is a collection of three datasets for perception and forecasting research in the self-driving domain.
The Lidar dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose.
The Motion Forecasting dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene.
arXiv Detail & Related papers (2023-01-02T00:36:22Z) - DOLPHINS: Dataset for Collaborative Perception enabled Harmonious and
Interconnected Self-driving [19.66714697653504]
Vehicle-to-Everything (V2X) network has enabled collaborative perception in autonomous driving.
The lack of datasets has severely blocked the development of collaborative perception algorithms.
We release DOLPHINS: dataset for cOllaborative Perception enabled Harmonious and INterconnected Self-driving.
arXiv Detail & Related papers (2022-07-15T17:07:07Z) - V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision
Transformer [58.71845618090022]
We build a holistic attention model, namely V2X-ViT, to fuse information across on-road agents.
V2X-ViT consists of alternating layers of heterogeneous multi-agent self-attention and multi-scale window self-attention.
To validate our approach, we create a large-scale V2X perception dataset.
arXiv Detail & Related papers (2022-03-20T20:18:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.