A Comprehensive Evaluation of Four End-to-End AI Autopilots Using CCTest and the Carla Leaderboard
- URL: http://arxiv.org/abs/2501.12090v2
- Date: Wed, 22 Jan 2025 04:22:14 GMT
- Title: A Comprehensive Evaluation of Four End-to-End AI Autopilots Using CCTest and the Carla Leaderboard
- Authors: Changwen Li, Joseph Sifakis, Rongjie Yan, Jian Zhang,
- Abstract summary: We apply the critical configuration testing approach to four end-to-end open autopilots, Transfuser, InterFuser, MILE and LMDriver.
First, we apply the critical configuration testing approach to four end-to-end open autopilots, Transfuser, InterFuser, MILE and LMDriver.
Second, we compare the evaluations of the four autopilots carried out in the Carla Leaderboard with our results obtained by testing critical configurations.
- Score: 6.229766691427486
- License:
- Abstract: Scenario-based testing is currently the dominant simulation-based validation approach for ADS. Its effective application raises two interrelated issues. The first is the choice of the method used to generate scenarios, based on various criteria such as risk, degree of autonomy, degree of coverage and representativeness, and complexity. The other is the choice of the evaluation method for estimating the safety and performance of the system under test. This work extends a study of the critical configuration testing (CCTest) approach we have already applied to four open modular autopilots. This approach differs from general scenario-based approaches in that it uses only realistic, potentially safe critical scenarios. It enables an accurate assessment of the ability to drive safely in critical situations for which feasible safety policies exist. Any incident observed in the simulation involves the failure of a tested autopilot. The contribution of this paper is twofold. First, we apply the critical configuration testing approach to four end-to-end open autopilots, Transfuser, InterFuser, MILE and LMDriver, and compare their test results with those of the four modular open autopilots previously tested with the same approach implemented in the Carla simulation environment. This comparison identifies both differences and similarities in the failures of the two autopilot types in critical situations. Secondly, we compare the evaluations of the four autopilots carried out in the Carla Leaderboard with our results obtained by testing critical configurations. This comparison reveals significant discrepancies, reflecting differences in test case generation criteria and risk assessment methods. It underlines the need to work towards the development of objective assessment methods combining qualitative and quantitative criteria.
Related papers
- Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction [54.23208041792073]
Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review.
A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods.
We propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels.
arXiv Detail & Related papers (2024-06-26T05:30:21Z) - Rigorous Simulation-based Testing for Autonomous Driving Systems -- Targeting the Achilles' Heel of Four Open Autopilots [6.229766691427486]
We propose a rigorous test method based on breaking down scenarios into simple ones.
We generate test cases for critical configurations that place the vehicle under test in critical situations.
Test cases reveal major defects in Apollo, Autoware, and the Carla and LGSVL autopilots.
arXiv Detail & Related papers (2024-05-27T08:06:21Z) - Automated System-level Testing of Unmanned Aerial Systems [2.2249176072603634]
A major requirement of international safety standards is to perform rigorous system-level testing of avionics software systems.
The proposed approach (AITester) utilizes model-based testing and artificial intelligence (AI) techniques to automatically generate, execute, and evaluate various test scenarios.
arXiv Detail & Related papers (2024-03-23T14:47:26Z) - Conservative Prediction via Data-Driven Confidence Minimization [70.93946578046003]
In safety-critical applications of machine learning, it is often desirable for a model to be conservative.
We propose the Data-Driven Confidence Minimization framework, which minimizes confidence on an uncertainty dataset.
arXiv Detail & Related papers (2023-06-08T07:05:36Z) - Realistic Safety-critical Scenarios Search for Autonomous Driving System
via Behavior Tree [8.286351881735191]
We propose the Matrix-Fuzzer, a behavior tree-based testing framework, to automatically generate realistic safety-critical test scenarios.
Our approach is able to find the most types of safety-critical scenarios, but only generating around 30% of the total scenarios compared with the baseline algorithm.
arXiv Detail & Related papers (2023-05-11T06:53:03Z) - Curriculum Learning for Safe Mapless Navigation [71.55718344087657]
This work investigates the effects of Curriculum Learning (CL)-based approaches on the agent's performance.
In particular, we focus on the safety aspect of robotic mapless navigation, comparing over a standard end-to-end (E2E) training strategy.
arXiv Detail & Related papers (2021-12-23T12:30:36Z) - Generating and Characterizing Scenarios for Safety Testing of Autonomous
Vehicles [86.9067793493874]
We propose efficient mechanisms to characterize and generate testing scenarios using a state-of-the-art driving simulator.
We use our method to characterize real driving data from the Next Generation Simulation (NGSIM) project.
We rank the scenarios by defining metrics based on the complexity of avoiding accidents and provide insights into how the AV could have minimized the probability of incurring an accident.
arXiv Detail & Related papers (2021-03-12T17:00:23Z) - Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring
Systems [64.4896118325552]
We evaluate the current state-of-the-art AES models using a model adversarial evaluation scheme and associated metrics.
We find that AES models are highly overstable. Even heavy modifications(as much as 25%) with content unrelated to the topic of the questions do not decrease the score produced by the models.
arXiv Detail & Related papers (2020-07-14T03:49:43Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z) - Pass-Fail Criteria for Scenario-Based Testing of Automated Driving
Systems [0.0]
This paper sets out a framework for assessing an automated driving system's behavioural safety in normal operation.
Risk-based rules cannot give a pass/fail decision from a single test case.
This considers statistical performance across many individual tests.
arXiv Detail & Related papers (2020-05-19T13:13:08Z) - Efficient statistical validation with edge cases to evaluate Highly
Automated Vehicles [6.198523595657983]
The widescale deployment of Autonomous Vehicles seems to be imminent despite many safety challenges that are yet to be resolved.
Existing standards focus on deterministic processes where the validation requires only a set of test cases that cover the requirements.
This paper presents a new approach to compute the statistical characteristics of a system's behaviour by biasing automatically generated test cases towards the worst case scenarios.
arXiv Detail & Related papers (2020-03-04T04:35:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.