Towards Reliable AI: Adequacy Metrics for Ensuring the Quality of
System-level Testing of Autonomous Vehicles
- URL: http://arxiv.org/abs/2311.08049v1
- Date: Tue, 14 Nov 2023 10:16:05 GMT
- Title: Towards Reliable AI: Adequacy Metrics for Ensuring the Quality of
System-level Testing of Autonomous Vehicles
- Authors: Neelofar Neelofar, Aldeida Aleti
- Abstract summary: We introduce a set of black-box test adequacy metrics called "Test suite Instance Space Adequacy" (TISA) metrics.
The TISA metrics offer a way to assess both the diversity and coverage of the test suite and the range of bugs detected during testing.
We evaluate the efficacy of the TISA metrics by examining their correlation with the number of bugs detected in system-level simulation testing of AVs.
- Score: 5.634825161148484
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: AI-powered systems have gained widespread popularity in various domains,
including Autonomous Vehicles (AVs). However, ensuring their reliability and
safety is challenging due to their complex nature. Conventional test adequacy
metrics, designed to evaluate the effectiveness of traditional software
testing, are often insufficient or impractical for these systems. White-box
metrics, which are specifically designed for these systems, leverage neuron
coverage information. These coverage metrics necessitate access to the
underlying AI model and training data, which may not always be available.
Furthermore, the existing adequacy metrics exhibit weak correlations with the
ability to detect faults in the generated test suite, creating a gap that we
aim to bridge in this study.
In this paper, we introduce a set of black-box test adequacy metrics called
"Test suite Instance Space Adequacy" (TISA) metrics, which can be used to gauge
the effectiveness of a test suite. The TISA metrics offer a way to assess both
the diversity and coverage of the test suite and the range of bugs detected
during testing. Additionally, we introduce a framework that permits testers to
visualise the diversity and coverage of the test suite in a two-dimensional
space, facilitating the identification of areas that require improvement.
We evaluate the efficacy of the TISA metrics by examining their correlation
with the number of bugs detected in system-level simulation testing of AVs. A
strong correlation, coupled with the short computation time, indicates their
effectiveness and efficiency in estimating the adequacy of testing AVs.
Related papers
- Context-Aware Testing: A New Paradigm for Model Testing with Large Language Models [49.06068319380296]
We introduce context-aware testing (CAT) which uses context as an inductive bias to guide the search for meaningful model failures.
We instantiate the first CAT system, SMART Testing, which employs large language models to hypothesize relevant and likely failures.
arXiv Detail & Related papers (2024-10-31T15:06:16Z) - Which Combination of Test Metrics Can Predict Success of a Software Project? A Case Study in a Year-Long Project Course [1.553083901660282]
Testing plays an important role in securing the success of a software development project.
We investigate whether we can quantify the effects various types of testing have on functional suitability.
arXiv Detail & Related papers (2024-08-22T04:23:51Z) - Active Test-Time Adaptation: Theoretical Analyses and An Algorithm [51.84691955495693]
Test-time adaptation (TTA) addresses distribution shifts for streaming test data in unsupervised settings.
We propose the novel problem setting of active test-time adaptation (ATTA) that integrates active learning within the fully TTA setting.
arXiv Detail & Related papers (2024-04-07T22:31:34Z) - Deep anytime-valid hypothesis testing [29.273915933729057]
We propose a general framework for constructing powerful, sequential hypothesis tests for nonparametric testing problems.
We develop a principled approach of leveraging the representation capability of machine learning models within the testing-by-betting framework.
Empirical results on synthetic and real-world datasets demonstrate that tests instantiated using our general framework are competitive against specialized baselines.
arXiv Detail & Related papers (2023-10-30T09:46:19Z) - On the Universal Adversarial Perturbations for Efficient Data-free
Adversarial Detection [55.73320979733527]
We propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs.
Experimental results show that our method achieves competitive detection performance on various text classification tasks.
arXiv Detail & Related papers (2023-06-27T02:54:07Z) - From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing.
This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time.
We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - Identifying and Explaining Safety-critical Scenarios for Autonomous
Vehicles via Key Features [5.634825161148484]
This paper uses Instance Space Analysis (ISA) to identify the significant features of test scenarios that affect their ability to reveal the unsafe behaviour of AVs.
ISA identifies the features that best differentiate safety-critical scenarios from normal driving and visualises the impact of these features on test scenario outcomes (safe/unsafe) in 2D.
To test the predictive ability of the identified features, we train five Machine Learning classifiers to classify test scenarios as safe or unsafe.
arXiv Detail & Related papers (2022-12-15T00:52:47Z) - Uncertainty-Driven Action Quality Assessment [67.20617610820857]
We propose a novel probabilistic model, named Uncertainty-Driven AQA (UD-AQA), to capture the diversity among multiple judge scores.
We generate the estimation of uncertainty for each prediction, which is employed to re-weight AQA regression loss.
Our proposed method achieves competitive results on three benchmarks including the Olympic events MTL-AQA and FineDiving, and the surgical skill JIGSAWS datasets.
arXiv Detail & Related papers (2022-07-29T07:21:15Z) - Complete Agent-driven Model-based System Testing for Autonomous Systems [0.0]
A novel approach to testing complex autonomous transportation systems is described.
It is intended to mitigate some of the most critical problems regarding verification and validation.
arXiv Detail & Related papers (2021-10-25T01:55:24Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.