Testing Autonomous Driving Systems -- What Really Matters and What Doesn't
- URL: http://arxiv.org/abs/2507.13661v2
- Date: Sun, 27 Jul 2025 08:09:19 GMT
- Title: Testing Autonomous Driving Systems -- What Really Matters and What Doesn't
- Authors: Changwen Li, Joseph Sifakis, Rongjie Yan, Jian Zhang,
- Abstract summary: This paper proposes a framework for comparing existing test methods in terms of their intrinsic effectiveness and validity.<n>It shows that many methods do not meet both of these requirements.<n>It is shown that most critical test methods do not take into account the nominal operational capabilities of autopilots.
- Score: 6.229766691427486
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite extensive research, the testing of autonomous driving systems (ADS) landscape remains fragmented, and there is currently no basis for an informed technical assessment of the importance and contribution of the current state of the art. This paper attempts to address this problem by exploring two complementary aspects. First, it proposes a framework for comparing existing test methods in terms of their intrinsic effectiveness and validity. It shows that many methods do not meet both of these requirements. Either because they are based on criteria that do not allow for rapid, inexpensive, and comprehensive detection of failures, or because the degree of validity of the properties tested cannot be accurately estimated. In particular, it is shown that most critical test methods do not take into account the nominal operational capabilities of autopilots and generate scenarios that are impossible for the tested vehicles to handle, resulting in unjustified rejections. Secondly, the paper shows that test effectiveness and validity are highly dependent on how autopilots are designed: how they choose between different control policies to perform maneuvers, as well as on the reproducibility of the results. In fact, most test methods take for granted two principles underlying traditional methods, but do not generally apply to ADS. We maintain that the absence of rationality and determinacy significantly impairs the effectiveness and validity of test methods, and provide test results on eight open autopilots, in which most do not satisfy these properties, thereby illustrating this fact. We conclude that under the current state of the art, it is impossible to obtain strong enough guarantees for essential autopilot properties and recommend that autopilots be developed with a view to both rationality and determinacy.
Related papers
- AI-Augmented Metamorphic Testing for Comprehensive Validation of Autonomous Vehicles [7.237068561453082]
Self-driving cars have the potential to revolutionize transportation, but ensuring their safety remains a significant challenge.<n> Conventional testing methodologies face critical limitations, including the oracle problem determining whether the systems behavior is correct.<n>We propose enhancing Metamorphic Testing (MT) by integrating AI-driven image generation tools, such as Stable Diffusion.
arXiv Detail & Related papers (2025-02-16T23:31:59Z) - A Comprehensive Evaluation of Four End-to-End AI Autopilots Using CCTest and the Carla Leaderboard [6.229766691427486]
End-to-end AI autopilots for autonomous driving systems have emerged as a promising alternative to traditional modular autopilots.<n>They suffer from the well-known problems of AI systems such as non-determinism, non-explainability, and anomalies.<n>This paper extends a study of the critical configuration testing approach that has been applied to four open modular autopilots.
arXiv Detail & Related papers (2025-01-21T12:33:32Z) - Towards Effective Evaluations and Comparisons for LLM Unlearning Methods [97.2995389188179]
This paper seeks to refine the evaluation of machine unlearning for large language models.<n>It addresses two key challenges -- the robustness of evaluation metrics and the trade-offs between competing goals.
arXiv Detail & Related papers (2024-06-13T14:41:00Z) - Rigorous Simulation-based Testing for Autonomous Driving Systems -- Targeting the Achilles' Heel of Four Open Autopilots [6.229766691427486]
We propose a rigorous test method based on breaking down scenarios into simple ones.
We generate test cases for critical configurations that place the vehicle under test in critical situations.
Test cases reveal major defects in Apollo, Autoware, and the Carla and LGSVL autopilots.
arXiv Detail & Related papers (2024-05-27T08:06:21Z) - Predicting Safety Misbehaviours in Autonomous Driving Systems using Uncertainty Quantification [8.213390074932132]
This paper evaluates different uncertainty quantification methods from the deep learning domain for the anticipatory testing of safety-critical misbehaviours.<n>We compute uncertainty scores as the vehicle executes, following the intuition that high uncertainty scores are indicative of unsupported runtime conditions.<n>In our study, we conducted an evaluation of the effectiveness and computational overhead associated with two uncertainty quantification methods, namely MC- Dropout and Deep Ensembles, for misbehaviour avoidance.
arXiv Detail & Related papers (2024-04-29T10:28:28Z) - An Empirical Study of Uncertainty Estimation Techniques for Detecting
Drift in Data Streams [4.818865062632567]
This study conducts a comprehensive empirical evaluation of using uncertainty values as substitutes for error rates in detecting drifts.
We examine five uncertainty estimation methods in conjunction with the ADWIN detector across seven real-world datasets.
Our results reveal that while the SWAG method exhibits superior calibration, the overall accuracy in detecting drifts is not notably impacted by the choice of uncertainty estimation method.
arXiv Detail & Related papers (2023-11-22T13:17:55Z) - Position: AI Evaluation Should Learn from How We Test Humans [65.36614996495983]
We argue that psychometrics, a theory originating in the 20th century for human assessment, could be a powerful solution to the challenges in today's AI evaluations.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - Conservative Prediction via Data-Driven Confidence Minimization [70.93946578046003]
In safety-critical applications of machine learning, it is often desirable for a model to be conservative.
We propose the Data-Driven Confidence Minimization framework, which minimizes confidence on an uncertainty dataset.
arXiv Detail & Related papers (2023-06-08T07:05:36Z) - On Pitfalls of Test-Time Adaptation [82.8392232222119]
Test-Time Adaptation (TTA) has emerged as a promising approach for tackling the robustness challenge under distribution shifts.
We present TTAB, a test-time adaptation benchmark that encompasses ten state-of-the-art algorithms, a diverse array of distribution shifts, and two evaluation protocols.
arXiv Detail & Related papers (2023-06-06T09:35:29Z) - Unsupervised Domain Adaptation for Speech Recognition via Uncertainty
Driven Self-Training [55.824641135682725]
Domain adaptation experiments using WSJ as a source domain and TED-LIUM 3 as well as SWITCHBOARD show that up to 80% of the performance of a system trained on ground-truth data can be recovered.
arXiv Detail & Related papers (2020-11-26T18:51:26Z) - Out-of-Distribution Detection for Automotive Perception [58.34808836642603]
Neural networks (NNs) are widely used for object classification in autonomous driving.
NNs can fail on input data not well represented by the training dataset, known as out-of-distribution (OOD) data.
This paper presents a method for determining whether inputs are OOD, which does not require OOD data during training and does not increase the computational cost of inference.
arXiv Detail & Related papers (2020-11-03T01:46:35Z) - Can Autonomous Vehicles Identify, Recover From, and Adapt to
Distribution Shifts? [104.04999499189402]
Out-of-training-distribution (OOD) scenarios are a common challenge of learning agents at deployment.
We propose an uncertainty-aware planning method, called emphrobust imitative planning (RIP)
Our method can detect and recover from some distribution shifts, reducing the overconfident and catastrophic extrapolations in OOD scenes.
We introduce an autonomous car novel-scene benchmark, textttCARNOVEL, to evaluate the robustness of driving agents to a suite of tasks with distribution shifts.
arXiv Detail & Related papers (2020-06-26T11:07:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.