DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative
3D Object Detection
- URL: http://arxiv.org/abs/2204.05575v1
- Date: Tue, 12 Apr 2022 07:13:33 GMT
- Title: DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative
3D Object Detection
- Authors: Haibao Yu, Yizhen Luo, Mao Shu, Yiyi Huo, Zebang Yang, Yifeng Shi,
Zhenglong Guo, Hanyu Li, Xing Hu, Jirui Yuan, Zaiqing Nie
- Abstract summary: DAIR-V2X is the first large-scale, multi-modality, multi-view dataset from real scenarios for Vehicle-Infrastructure Cooperative Autonomous Driving.
DAIR-V2X comprises 71254 LiDAR frames and 71254 Camera frames, and all frames are captured from real scenes with 3D annotations.
- Score: 8.681912341444901
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Autonomous driving faces great safety challenges for a lack of global
perspective and the limitation of long-range perception capabilities. It has
been widely agreed that vehicle-infrastructure cooperation is required to
achieve Level 5 autonomy. However, there is still NO dataset from real
scenarios available for computer vision researchers to work on
vehicle-infrastructure cooperation-related problems. To accelerate computer
vision research and innovation for Vehicle-Infrastructure Cooperative
Autonomous Driving (VICAD), we release DAIR-V2X Dataset, which is the first
large-scale, multi-modality, multi-view dataset from real scenarios for VICAD.
DAIR-V2X comprises 71254 LiDAR frames and 71254 Camera frames, and all frames
are captured from real scenes with 3D annotations. The Vehicle-Infrastructure
Cooperative 3D Object Detection problem (VIC3D) is introduced, formulating the
problem of collaboratively locating and identifying 3D objects using sensory
inputs from both vehicle and infrastructure. In addition to solving traditional
3D object detection problems, the solution of VIC3D needs to consider the
temporal asynchrony problem between vehicle and infrastructure sensors and the
data transmission cost between them. Furthermore, we propose Time Compensation
Late Fusion (TCLF), a late fusion framework for the VIC3D task as a benchmark
based on DAIR-V2X. Find data, code, and more up-to-date information at
https://thudair.baai.ac.cn/index and https://github.com/AIR-THU/DAIR-V2X.
Related papers
- Multi-V2X: A Large Scale Multi-modal Multi-penetration-rate Dataset for Cooperative Perception [3.10770247120758]
We introduce Multi-V2X, a large-scale, multi-modal, multi-penetration-rate dataset for V2X perception.
In total, our Multi-V2X dataset comprises 549k RGB frames, 146k LiDAR frames, and 4,219k annotated 3D bounding boxes.
The highest possible CAV penetration rate reaches 86.21%, with up to 31 agents in communication range.
arXiv Detail & Related papers (2024-09-08T05:22:00Z) - InScope: A New Real-world 3D Infrastructure-side Collaborative Perception Dataset for Open Traffic Scenarios [13.821143687548494]
This paper introduces a new 3D infrastructure-side collaborative perception dataset, abbreviated as inscope.
InScope encapsulates a 20-day capture duration with 303 tracking trajectories and 187,787 3D bounding boxes annotated by experts.
arXiv Detail & Related papers (2024-07-31T13:11:14Z) - EMIFF: Enhanced Multi-scale Image Feature Fusion for
Vehicle-Infrastructure Cooperative 3D Object Detection [23.32916754209488]
Two major challenges persist in vehicle-infrastructure cooperative 3D (VIC3D) object detection.
We propose a novel camera-based 3D detection framework for VIC3D task, Enhanced Multi-scale Image Feature Fusion (EMIFF)
Experiments show that EMIFF achieves SOTA on DAIR-V2X-C datasets, significantly outperforming previous early-fusion and late-fusion methods with comparable transmission costs.
arXiv Detail & Related papers (2024-02-23T11:35:48Z) - V2V4Real: A Real-world Large-scale Dataset for Vehicle-to-Vehicle
Cooperative Perception [49.7212681947463]
Vehicle-to-Vehicle (V2V) cooperative perception system has great potential to revolutionize the autonomous driving industry.
We present V2V4Real, the first large-scale real-world multi-modal dataset for V2V perception.
Our dataset covers a driving area of 410 km, comprising 20K LiDAR frames, 40K RGB frames, 240K annotated 3D bounding boxes for 5 classes, and HDMaps.
arXiv Detail & Related papers (2023-03-14T02:49:20Z) - Argoverse 2: Next Generation Datasets for Self-Driving Perception and
Forecasting [64.7364925689825]
Argoverse 2 (AV2) is a collection of three datasets for perception and forecasting research in the self-driving domain.
The Lidar dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose.
The Motion Forecasting dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene.
arXiv Detail & Related papers (2023-01-02T00:36:22Z) - IDD-3D: Indian Driving Dataset for 3D Unstructured Road Scenes [79.18349050238413]
Preparation and training of deploy-able deep learning architectures require the models to be suited to different traffic scenarios.
An unstructured and complex driving layout found in several developing countries such as India poses a challenge to these models.
We build a new dataset, IDD-3D, which consists of multi-modal data from multiple cameras and LiDAR sensors with 12k annotated driving LiDAR frames.
arXiv Detail & Related papers (2022-10-23T23:03:17Z) - DOLPHINS: Dataset for Collaborative Perception enabled Harmonious and
Interconnected Self-driving [19.66714697653504]
Vehicle-to-Everything (V2X) network has enabled collaborative perception in autonomous driving.
The lack of datasets has severely blocked the development of collaborative perception algorithms.
We release DOLPHINS: dataset for cOllaborative Perception enabled Harmonious and INterconnected Self-driving.
arXiv Detail & Related papers (2022-07-15T17:07:07Z) - V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision
Transformer [58.71845618090022]
We build a holistic attention model, namely V2X-ViT, to fuse information across on-road agents.
V2X-ViT consists of alternating layers of heterogeneous multi-agent self-attention and multi-scale window self-attention.
To validate our approach, we create a large-scale V2X perception dataset.
arXiv Detail & Related papers (2022-03-20T20:18:25Z) - PC-DAN: Point Cloud based Deep Affinity Network for 3D Multi-Object
Tracking (Accepted as an extended abstract in JRDB-ACT Workshop at CVPR21) [68.12101204123422]
A point cloud is a dense compilation of spatial data in 3D coordinates.
We propose a PointNet-based approach for 3D Multi-Object Tracking (MOT)
arXiv Detail & Related papers (2021-06-03T05:36:39Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.