Related papers: Make Full Use of Testing Information: An Integrated Accelerated Testing and Evaluation Method for Autonomous Driving Systems

Make Full Use of Testing Information: An Integrated Accelerated Testing and Evaluation Method for Autonomous Driving Systems

URL: http://arxiv.org/abs/2501.11924v1
Date: Tue, 21 Jan 2025 06:59:25 GMT
Title: Make Full Use of Testing Information: An Integrated Accelerated Testing and Evaluation Method for Autonomous Driving Systems
Authors: Xinzheng Wu, Junyi Chen, Jianfeng Wu, Longgao Zhang, Tian Xia, Yong Shen,
Abstract summary: This paper proposes an Integrated accelerated Testing and Evaluation Method (ITEM) for testing and evaluation of autonomous driving systems (ADSs)<n>To make full use of testing information, this paper proposes an Integrated accelerated Testing and Evaluation Method (ITEM)<n>The experimental results show that ITEM could well identify the hazardous domains in both low- and high-dimensional cases, regardless of the shape of the hazardous domains.
Score: 6.065650382599096
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Testing and evaluation is an important step before the large-scale application of the autonomous driving systems (ADSs). Based on the three level of scenario abstraction theory, a testing can be performed within a logical scenario, followed by an evaluation stage which is inputted with the testing results of each concrete scenario generated from the logical parameter space. During the above process, abundant testing information is produced which is beneficial for comprehensive and accurate evaluations. To make full use of testing information, this paper proposes an Integrated accelerated Testing and Evaluation Method (ITEM). Based on a Monte Carlo Tree Search (MCTS) paradigm and a dual surrogates testing framework proposed in our previous work, this paper applies the intermediate information (i.e., the tree structure, including the affiliation of each historical sampled point with the subspaces and the parent-child relationship between subspaces) generated during the testing stage into the evaluation stage to achieve accurate hazardous domain identification. Moreover, to better serve this purpose, the UCB calculation method is improved to allow the search algorithm to focus more on the hazardous domain boundaries. Further, a stopping condition is constructed based on the convergence of the search algorithm. Ablation and comparative experiments are then conducted to verify the effectiveness of the improvements and the superiority of the proposed method. The experimental results show that ITEM could well identify the hazardous domains in both low- and high-dimensional cases, regardless of the shape of the hazardous domains, indicating its generality and potential for the safety evaluation of ADSs.

Related papers

SEOE: A Scalable and Reliable Semantic Evaluation Framework for Open Domain Event Detection [70.23196257213829]
We propose a scalable and reliable Semantic-level Evaluation framework for Open domain Event detection. Our proposed framework first constructs a scalable evaluation benchmark that currently includes 564 event types covering 7 major domains. We then leverage large language models (LLMs) as automatic evaluation agents to compute a semantic F1-score, incorporating fine-grained definitions of semantically similar labels.
arXiv Detail & Related papers (2025-03-05T09:37:05Z)
Using Cooperative Co-evolutionary Search to Generate Metamorphic Test Cases for Autonomous Driving Systems [7.519606883700972]
This paper introduces CoCoMEGA, a novel automated testing framework aimed at advancing system-level safety assessments of Autonomous Driving Systems (ADSs)<n>CoCoMEGA emphasizes the identification of test scenarios that present undesirable system behavior, that may eventually lead to safety violations, captured by Metamorphic Relations (MRs)<n>Future research directions may include extending the approach to additional simulation platforms, applying it to other complex systems, and exploring methods for further improving testing efficiency such as surrogate modeling.
arXiv Detail & Related papers (2024-12-05T03:17:31Z)
LAMBDA: Covering the Multimodal Critical Scenarios for Automated Driving Systems by Search Space Quantization [33.87626198349963]
Black-Box Optimization (BBO) was introduced to accelerate the scenario-based test of automated driving systems (ADSs)<n>All the subspaces representing danger in the logical scenario space, rather than only the most critical concrete scenario, play a more significant role for the safety evaluation.<n>We propose LAMBDA (Latent-Action Monte-Carlo Beam Search with Density Adaption) to solve BBC problems.
arXiv Detail & Related papers (2024-11-30T15:57:05Z)
Hyperparameters in Continual Learning: A Reality Check [53.30082523545212]
Continual learning (CL) aims to train a model on a sequence of tasks while balancing the trade-off between plasticity (learning new tasks) and stability (retaining prior knowledge) The dominantly adopted conventional evaluation protocol for CL algorithms selects the best hyper parameters in a given scenario and then evaluates the algorithms in the same scenario. This protocol has significant shortcomings: it overestimates the CL capacity of algorithms and relies on unrealistic hyper parameter tuning. We argue that the evaluation of CL algorithms should focus on assessing the generalizability of their CL capacity to unseen scenarios.
arXiv Detail & Related papers (2024-03-14T03:13:01Z)
Few-Shot Scenario Testing for Autonomous Vehicles Based on Neighborhood Coverage and Similarity [8.97909097472183]
Testing and evaluating safety performance of autonomous vehicles (AVs) is essential before the large-scale deployment. The number of testing scenarios permissible for a specific AV is severely limited by tight constraints on testing budgets and time. We formulate this problem for the first time the "few-shot testing" (FST) problem and propose a systematic framework to address this challenge.
arXiv Detail & Related papers (2024-02-02T04:47:14Z)
Beyond AUROC & co. for evaluating out-of-distribution detection performance [50.88341818412508]
Given their relevance for safe(r) AI, it is important to examine whether the basis for comparing OOD detection methods is consistent with practical needs. We propose a new metric - Area Under the Threshold Curve (AUTC), which explicitly penalizes poor separation between ID and OOD samples.
arXiv Detail & Related papers (2023-06-26T12:51:32Z)
From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing. This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time. We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z)
Plugin estimators for selective classification with out-of-distribution detection [67.28226919253214]
Real-world classifiers can benefit from abstaining from predicting on samples where they have low confidence. These settings have been the subject of extensive but disjoint study in the selective classification (SC) and out-of-distribution (OOD) detection literature. Recent work on selective classification with OOD detection has argued for the unified study of these problems. We propose new plugin estimators for SCOD that are theoretically grounded, effective, and generalise existing approaches.
arXiv Detail & Related papers (2023-01-29T07:45:17Z)
OpenOOD: Benchmarking Generalized Out-of-Distribution Detection [60.13300701826931]
Out-of-distribution (OOD) detection is vital to safety-critical machine learning applications. The field currently lacks a unified, strictly formulated, and comprehensive benchmark. We build a unified, well-structured called OpenOOD, which implements over 30 methods developed in relevant fields.
arXiv Detail & Related papers (2022-10-13T17:59:57Z)
Efficient and Effective Generation of Test Cases for Pedestrian Detection -- Search-based Software Testing of Baidu Apollo in SVL [14.482670650074885]
This paper presents a study on testing pedestrian detection and emergency braking system of the Baidu Apollo autonomous driving platform within the SVL simulator. We propose an evolutionary automated test generation technique that generates failure-revealing scenarios for Apollo in the SVL environment. In order to demonstrate the efficiency and effectiveness of our approach, we also report the results from a baseline random generation technique.
arXiv Detail & Related papers (2021-09-16T13:11:53Z)
SAMBA: Safe Model-Based & Active Reinforcement Learning [59.01424351231993]
SAMBA is a framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations. We provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.
arXiv Detail & Related papers (2020-06-12T10:40:46Z)
Efficient statistical validation with edge cases to evaluate Highly Automated Vehicles [6.198523595657983]
The widescale deployment of Autonomous Vehicles seems to be imminent despite many safety challenges that are yet to be resolved. Existing standards focus on deterministic processes where the validation requires only a set of test cases that cover the requirements. This paper presents a new approach to compute the statistical characteristics of a system's behaviour by biasing automatically generated test cases towards the worst case scenarios.
arXiv Detail & Related papers (2020-03-04T04:35:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.