A Survey on Autonomous Driving Datasets: Statistics, Annotation Quality, and a Future Outlook
- URL: http://arxiv.org/abs/2401.01454v2
- Date: Tue, 23 Apr 2024 09:08:11 GMT
- Title: A Survey on Autonomous Driving Datasets: Statistics, Annotation Quality, and a Future Outlook
- Authors: Mingyu Liu, Ekim Yurtsever, Jonathan Fossaert, Xingcheng Zhou, Walter Zimmer, Yuning Cui, Bare Luka Zagar, Alois C. Knoll,
- Abstract summary: We present an exhaustive study of 265 autonomous driving datasets from multiple perspectives.
We introduce a novel metric to evaluate the impact of datasets, which can also be a guide for creating new datasets.
We discuss the current challenges and the development trend of the future autonomous driving datasets.
- Score: 24.691922611156937
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autonomous driving has rapidly developed and shown promising performance due to recent advances in hardware and deep learning techniques. High-quality datasets are fundamental for developing reliable autonomous driving algorithms. Previous dataset surveys either focused on a limited number or lacked detailed investigation of dataset characteristics. To this end, we present an exhaustive study of 265 autonomous driving datasets from multiple perspectives, including sensor modalities, data size, tasks, and contextual conditions. We introduce a novel metric to evaluate the impact of datasets, which can also be a guide for creating new datasets. Besides, we analyze the annotation processes, existing labeling tools, and the annotation quality of datasets, showing the importance of establishing a standard annotation pipeline. On the other hand, we thoroughly analyze the impact of geographical and adversarial environmental conditions on the performance of autonomous driving systems. Moreover, we exhibit the data distribution of several vital datasets and discuss their pros and cons accordingly. Finally, we discuss the current challenges and the development trend of the future autonomous driving datasets.
Related papers
- Collaborative Perception Datasets in Autonomous Driving: A Survey [0.0]
The paper systematically analyzes a variety of datasets, comparing them based on aspects such as diversity, sensor setup, quality, public availability, and their applicability to downstream tasks.
The importance of addressing privacy and security concerns in the development of datasets is emphasized, regarding data sharing and dataset creation.
arXiv Detail & Related papers (2024-04-22T09:36:17Z) - D2E-An Autonomous Decision-making Dataset involving Driver States and Human Evaluation [6.890077875318333]
Driver to Evaluation dataset (D2E) is an autonomous decision-making dataset.
It contains data on driver states, vehicle states, environmental situations, and evaluation scores from human reviewers.
D2E contains over 1100 segments of interactive driving case data covering from human driver factor to evaluation results.
arXiv Detail & Related papers (2024-04-12T21:29:18Z) - SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control [59.20038082523832]
We present SubjectDrive, the first model proven to scale generative data production in a way that could continuously improve autonomous driving applications.
We develop a novel model equipped with a subject control mechanism, which allows the generative model to leverage diverse external data sources for producing varied and useful data.
arXiv Detail & Related papers (2024-03-28T14:07:13Z) - Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future [130.87142103774752]
This review systematically assesses over seventy open-source autonomous driving datasets.
It offers insights into various aspects, such as the principles underlying the creation of high-quality datasets.
It also delves into the scientific and technical challenges that warrant resolution.
arXiv Detail & Related papers (2023-12-06T10:46:53Z) - On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies.
Machine and deep learning algorithms depend heavily on the data used during their development.
We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z) - A Survey on Datasets for Decision-making of Autonomous Vehicle [11.556769001552768]
Decision-making is one of the critical modules toward high-level automated driving.
Data-driven decision-making approaches have aroused more and more focus.
This study compares the state-of-the-art datasets of vehicle, environment, and driver related data.
arXiv Detail & Related papers (2023-06-29T08:42:18Z) - LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting [65.71129509623587]
Road traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning.
However, the promising results achieved on current public datasets may not be applicable to practical scenarios.
We introduce the LargeST benchmark dataset, which includes a total of 8,600 sensors in California with a 5-year time coverage.
arXiv Detail & Related papers (2023-06-14T05:48:36Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - Perspective, Survey and Trends: Public Driving Datasets and Toolsets for
Autonomous Driving Virtual Test [4.2628421392139]
This paper first proposes a Systematic Literature Review (SLR) approach for autonomous driving tests, then presents an overview of existing publicly available datasets and toolsets from 2000 to 2020.
We are the first to perform such recent empirical survey on both the datasets and toolsets using a SLA based survey approach.
arXiv Detail & Related papers (2021-04-01T06:17:01Z) - Diverse Complexity Measures for Dataset Curation in Self-driving [80.55417232642124]
We propose a new data selection method that exploits a diverse set of criteria that quantize interestingness of traffic scenes.
Our experiments show that the proposed curation pipeline is able to select datasets that lead to better generalization and higher performance.
arXiv Detail & Related papers (2021-01-16T23:45:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.