Related papers: TaCarla: A comprehensive benchmarking dataset for end-to-end autonomous driving

TaCarla: A comprehensive benchmarking dataset for end-to-end autonomous driving

URL: http://arxiv.org/abs/2602.23499v1
Date: Thu, 26 Feb 2026 21:16:20 GMT
Title: TaCarla: A comprehensive benchmarking dataset for end-to-end autonomous driving
Authors: Tugrul Gorgulu, Atakan Dag, M. Esat Kalfaoglu, Halil Ibrahim Kuru, Baris Can Cam, Ozsel Kilinc,
Abstract summary: We have collected a new dataset comprising over 2.85 million frames using the CARLA simulation environment for the diverse Leaderboard 2.0 challenge scenarios.<n>Our dataset is designed not only for planning tasks but also supports dynamic object detection, lane divider detection, centerline detection, traffic light recognition, prediction tasks and visual language action models.
Score: 3.037642191465275
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Collecting a high-quality dataset is a critical task that demands meticulous attention to detail, as overlooking certain aspects can render the entire dataset unusable. Autonomous driving challenges remain a prominent area of research, requiring further exploration to enhance the perception and planning performance of vehicles. However, existing datasets are often incomplete. For instance, datasets that include perception information generally lack planning data, while planning datasets typically consist of extensive driving sequences where the ego vehicle predominantly drives forward, offering limited behavioral diversity. In addition, many real datasets struggle to evaluate their models, especially for planning tasks, since they lack a proper closed-loop evaluation setup. The CARLA Leaderboard 2.0 challenge, which provides a diverse set of scenarios to address the long-tail problem in autonomous driving, has emerged as a valuable alternative platform for developing perception and planning models in both open-loop and closed-loop evaluation setups. Nevertheless, existing datasets collected on this platform present certain limitations. Some datasets appear to be tailored primarily for limited sensor configuration, with particular sensor configurations. To support end-to-end autonomous driving research, we have collected a new dataset comprising over 2.85 million frames using the CARLA simulation environment for the diverse Leaderboard 2.0 challenge scenarios. Our dataset is designed not only for planning tasks but also supports dynamic object detection, lane divider detection, centerline detection, traffic light recognition, prediction tasks and visual language action models . Furthermore, we demonstrate its versatility by training various models using our dataset. Moreover, we also provide numerical rarity scores to understand how rarely the current state occurs in the dataset.

Related papers

Advancing Real-World Parking Slot Detection with Large-Scale Dataset and Semi-Supervised Baseline [65.25540269603553]
This study focuses on parking slot detection using surround-view cameras, which offer a comprehensive bird's-eye view of the parking environment.<n>We first construct a large-scale parking slot detection dataset (CRPS-D), which includes various lighting distributions, diverse weather conditions, and challenging parking slot variants.<n>We develop a semi-supervised baseline for parking slot detection, termed SS-PSD, to further improve performance by exploiting unlabeled data.
arXiv Detail & Related papers (2025-09-16T14:50:19Z)
ROVR-Open-Dataset: A Large-Scale Depth Dataset for Autonomous Driving [62.9051914830949]
We present ROVR, a large-scale, diverse, and cost-efficient depth dataset designed to capture the complexity of real-world driving.<n>A lightweight acquisition pipeline ensures scalable collection, while sparse but statistically sufficient ground truth supports robust training.<n> Benchmarking with state-of-the-art monocular depth models reveals severe cross-dataset generalization failures.
arXiv Detail & Related papers (2025-08-19T16:13:49Z)
D2E-An Autonomous Decision-making Dataset involving Driver States and Human Evaluation [6.890077875318333]
Driver to Evaluation dataset (D2E) is an autonomous decision-making dataset. It contains data on driver states, vehicle states, environmental situations, and evaluation scores from human reviewers. D2E contains over 1100 segments of interactive driving case data covering from human driver factor to evaluation results.
arXiv Detail & Related papers (2024-04-12T21:29:18Z)
SCANIA Component X Dataset: A Real-World Multivariate Time Series Dataset for Predictive Maintenance [5.557442038265024]
This paper introduces a real-world, multivariate time series dataset collected exclusively from a single anonymized engine component (Component X) across a fleet of SCANIA trucks.<n>The dataset includes operational data, repair records, and specifications related to Component X, while maintaining confidentiality through anonymization.<n>It is well-suited for a range of machine learning applications, including classification, regression, survival analysis, and anomaly detection.
arXiv Detail & Related papers (2024-01-26T20:51:55Z)
A Survey on Autonomous Driving Datasets: Statistics, Annotation Quality, and a Future Outlook [24.691922611156937]
We present an exhaustive study of 265 autonomous driving datasets from multiple perspectives. We introduce a novel metric to evaluate the impact of datasets, which can also be a guide for creating new datasets. We discuss the current challenges and the development trend of the future autonomous driving datasets.
arXiv Detail & Related papers (2024-01-02T22:35:33Z)
LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting [65.71129509623587]
Road traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning. However, the promising results achieved on current public datasets may not be applicable to practical scenarios. We introduce the LargeST benchmark dataset, which includes a total of 8,600 sensors in California with a 5-year time coverage.
arXiv Detail & Related papers (2023-06-14T05:48:36Z)
Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting [64.7364925689825]
Argoverse 2 (AV2) is a collection of three datasets for perception and forecasting research in the self-driving domain. The Lidar dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. The Motion Forecasting dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene.
arXiv Detail & Related papers (2023-01-02T00:36:22Z)
IDD-3D: Indian Driving Dataset for 3D Unstructured Road Scenes [79.18349050238413]
Preparation and training of deploy-able deep learning architectures require the models to be suited to different traffic scenarios. An unstructured and complex driving layout found in several developing countries such as India poses a challenge to these models. We build a new dataset, IDD-3D, which consists of multi-modal data from multiple cameras and LiDAR sensors with 12k annotated driving LiDAR frames.
arXiv Detail & Related papers (2022-10-23T23:03:17Z)
DOLPHINS: Dataset for Collaborative Perception enabled Harmonious and Interconnected Self-driving [19.66714697653504]
Vehicle-to-Everything (V2X) network has enabled collaborative perception in autonomous driving. The lack of datasets has severely blocked the development of collaborative perception algorithms. We release DOLPHINS: dataset for cOllaborative Perception enabled Harmonious and INterconnected Self-driving.
arXiv Detail & Related papers (2022-07-15T17:07:07Z)
One Million Scenes for Autonomous Driving: ONCE Dataset [91.94189514073354]
We introduce the ONCE dataset for 3D object detection in the autonomous driving scenario. The data is selected from 144 driving hours, which is 20x longer than the largest 3D autonomous driving dataset available. We reproduce and evaluate a variety of self-supervised and semi-supervised methods on the ONCE dataset.
arXiv Detail & Related papers (2021-06-21T12:28:08Z)
Diverse Complexity Measures for Dataset Curation in Self-driving [80.55417232642124]
We propose a new data selection method that exploits a diverse set of criteria that quantize interestingness of traffic scenes. Our experiments show that the proposed curation pipeline is able to select datasets that lead to better generalization and higher performance.
arXiv Detail & Related papers (2021-01-16T23:45:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.