A Comprehensive Evaluation of Four End-to-End AI Autopilots Using CCTest and the Carla Leaderboard
- URL: http://arxiv.org/abs/2501.12090v3
- Date: Mon, 24 Mar 2025 08:18:29 GMT
- Title: A Comprehensive Evaluation of Four End-to-End AI Autopilots Using CCTest and the Carla Leaderboard
- Authors: Changwen Li, Joseph Sifakis, Rongjie Yan, Jian Zhang,
- Abstract summary: End-to-end AI autopilots for autonomous driving systems have emerged as a promising alternative to traditional modular autopilots.<n>They suffer from the well-known problems of AI systems such as non-determinism, non-explainability, and anomalies.<n>This paper extends a study of the critical configuration testing approach that has been applied to four open modular autopilots.
- Score: 6.229766691427486
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: End-to-end AI autopilots for autonomous driving systems have emerged as a promising alternative to traditional modular autopilots, offering the potential to reduce development costs and mitigate defects arising from module composition. However, they suffer from the well-known problems of AI systems such as non-determinism, non-explainability, and anomalies. This naturally raises the question of their evaluation and, in particular, their comparison with existing modular solutions. This work extends a study of the critical configuration testing (CCTest) approach that has been applied to four open modular autopilots. This approach differs from others in that it generates test cases ensuring safe control policies are possible for the tested autopilots. This enables an accurate assessment of the ability to drive safely in critical situations, as any incident observed in the simulation involves the failure of a tested autopilot. The contribution of this paper is twofold. Firstly, we apply the CCTest approach to four end-to-end open autopilots, InterFuser, MILE, Transfuser, and LMDrive, and compare their test results with those of the four modular open autopilots previously tested with the same approach implemented in the Carla simulation environment. This comparison identifies both differences and similarities in the failures of the two autopilot types in critical configurations. Secondly, we compare the evaluations of the four autopilots carried out in the Carla Leaderboard with the CCTest results. This comparison reveals significant discrepancies, reflecting differences in test case generation criteria and risk assessment methods. It underlines the need to work towards the development of objective assessment methods combining qualitative and quantitative criteria.
Related papers
- An LSTM-based Test Selection Method for Self-Driving Cars [1.3450023647228841]
This study addresses the test selection problem for lane-keeping systems for self-driving cars.
Road segment features, such as angles and lengths, were extracted and treated as sequences.
The proposed model is compared against machine learning-based test selectors.
arXiv Detail & Related papers (2025-01-07T15:44:06Z) - SureMap: Simultaneous Mean Estimation for Single-Task and Multi-Task Disaggregated Evaluation [75.56845750400116]
Disaggregated evaluation -- estimation of performance of a machine learning model on different subpopulations -- is a core task when assessing performance and group-fairness of AI systems.
We develop SureMap that has high estimation accuracy for both multi-task and single-task disaggregated evaluations of blackbox models.
Our method combines maximum a posteriori (MAP) estimation using a well-chosen prior together with cross-validation-free tuning via Stein's unbiased risk estimate (SURE)
arXiv Detail & Related papers (2024-11-14T17:53:35Z) - Towards Interactive and Learnable Cooperative Driving Automation: a Large Language Model-Driven Decision-Making Framework [79.088116316919]
Connected Autonomous Vehicles (CAVs) have begun to open road testing around the world, but their safety and efficiency performance in complex scenarios is still not satisfactory.
This paper proposes CoDrivingLLM, an interactive and learnable LLM-driven cooperative driving framework.
arXiv Detail & Related papers (2024-09-19T14:36:00Z) - Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction [54.23208041792073]
Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review.
A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods.
We propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels.
arXiv Detail & Related papers (2024-06-26T05:30:21Z) - Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving [59.705635382104454]
We present Bench2Drive, the first benchmark for evaluating E2E-AD systems' multiple abilities in a closed-loop manner.
We implement state-of-the-art E2E-AD models and evaluate them in Bench2Drive, providing insights regarding current status and future directions.
arXiv Detail & Related papers (2024-06-06T09:12:30Z) - Rigorous Simulation-based Testing for Autonomous Driving Systems -- Targeting the Achilles' Heel of Four Open Autopilots [6.229766691427486]
We propose a rigorous test method based on breaking down scenarios into simple ones.
We generate test cases for critical configurations that place the vehicle under test in critical situations.
Test cases reveal major defects in Apollo, Autoware, and the Carla and LGSVL autopilots.
arXiv Detail & Related papers (2024-05-27T08:06:21Z) - Automated System-level Testing of Unmanned Aerial Systems [2.2249176072603634]
A major requirement of international safety standards is to perform rigorous system-level testing of avionics software systems.
The proposed approach (AITester) utilizes model-based testing and artificial intelligence (AI) techniques to automatically generate, execute, and evaluate various test scenarios.
arXiv Detail & Related papers (2024-03-23T14:47:26Z) - Conservative Prediction via Data-Driven Confidence Minimization [70.93946578046003]
In safety-critical applications of machine learning, it is often desirable for a model to be conservative.
We propose the Data-Driven Confidence Minimization framework, which minimizes confidence on an uncertainty dataset.
arXiv Detail & Related papers (2023-06-08T07:05:36Z) - Two is Better Than One: Digital Siblings to Improve Autonomous Driving Testing [10.518360486008964]
We introduce the notion of digital siblings, a multi-simulator approach that tests a given autonomous vehicle on multiple general-purpose simulators.
We empirically compare such a multi-simulator approach against a digital twin of a physical scaled autonomous vehicle on a large set of test cases.
Our empirical evaluation shows that the ensemble failure predictor by the digital siblings is superior to each individual simulator at predicting the failures of the digital twin.
arXiv Detail & Related papers (2023-05-14T04:10:56Z) - Realistic Safety-critical Scenarios Search for Autonomous Driving System
via Behavior Tree [8.286351881735191]
We propose the Matrix-Fuzzer, a behavior tree-based testing framework, to automatically generate realistic safety-critical test scenarios.
Our approach is able to find the most types of safety-critical scenarios, but only generating around 30% of the total scenarios compared with the baseline algorithm.
arXiv Detail & Related papers (2023-05-11T06:53:03Z) - Uncertainty Quantification of Collaborative Detection for Self-Driving [12.590332512097698]
Sharing information between connected and autonomous vehicles (CAVs) improves the performance of collaborative object detection for self-driving.
However, CAVs still have uncertainties on object detection due to practical challenges.
Our work is the first to estimate the uncertainty of collaborative object detection.
arXiv Detail & Related papers (2022-09-16T20:30:45Z) - Curriculum Learning for Safe Mapless Navigation [71.55718344087657]
This work investigates the effects of Curriculum Learning (CL)-based approaches on the agent's performance.
In particular, we focus on the safety aspect of robotic mapless navigation, comparing over a standard end-to-end (E2E) training strategy.
arXiv Detail & Related papers (2021-12-23T12:30:36Z) - Probabilistic Ranking-Aware Ensembles for Enhanced Object Detections [50.096540945099704]
We propose a novel ensemble called the Probabilistic Ranking Aware Ensemble (PRAE) that refines the confidence of bounding boxes from detectors.
We also introduce a bandit approach to address the confidence imbalance problem caused by the need to deal with different numbers of boxes.
arXiv Detail & Related papers (2021-05-07T09:37:06Z) - Generating and Characterizing Scenarios for Safety Testing of Autonomous
Vehicles [86.9067793493874]
We propose efficient mechanisms to characterize and generate testing scenarios using a state-of-the-art driving simulator.
We use our method to characterize real driving data from the Next Generation Simulation (NGSIM) project.
We rank the scenarios by defining metrics based on the complexity of avoiding accidents and provide insights into how the AV could have minimized the probability of incurring an accident.
arXiv Detail & Related papers (2021-03-12T17:00:23Z) - Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring
Systems [64.4896118325552]
We evaluate the current state-of-the-art AES models using a model adversarial evaluation scheme and associated metrics.
We find that AES models are highly overstable. Even heavy modifications(as much as 25%) with content unrelated to the topic of the questions do not decrease the score produced by the models.
arXiv Detail & Related papers (2020-07-14T03:49:43Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z) - Pass-Fail Criteria for Scenario-Based Testing of Automated Driving
Systems [0.0]
This paper sets out a framework for assessing an automated driving system's behavioural safety in normal operation.
Risk-based rules cannot give a pass/fail decision from a single test case.
This considers statistical performance across many individual tests.
arXiv Detail & Related papers (2020-05-19T13:13:08Z) - Efficient statistical validation with edge cases to evaluate Highly
Automated Vehicles [6.198523595657983]
The widescale deployment of Autonomous Vehicles seems to be imminent despite many safety challenges that are yet to be resolved.
Existing standards focus on deterministic processes where the validation requires only a set of test cases that cover the requirements.
This paper presents a new approach to compute the statistical characteristics of a system's behaviour by biasing automatically generated test cases towards the worst case scenarios.
arXiv Detail & Related papers (2020-03-04T04:35:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.