Towards Reliable AI: Adequacy Metrics for Ensuring the Quality of
System-level Testing of Autonomous Vehicles
- URL: http://arxiv.org/abs/2311.08049v1
- Date: Tue, 14 Nov 2023 10:16:05 GMT
- Title: Towards Reliable AI: Adequacy Metrics for Ensuring the Quality of
System-level Testing of Autonomous Vehicles
- Authors: Neelofar Neelofar, Aldeida Aleti
- Abstract summary: We introduce a set of black-box test adequacy metrics called "Test suite Instance Space Adequacy" (TISA) metrics.
The TISA metrics offer a way to assess both the diversity and coverage of the test suite and the range of bugs detected during testing.
We evaluate the efficacy of the TISA metrics by examining their correlation with the number of bugs detected in system-level simulation testing of AVs.
- Score: 5.634825161148484
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: AI-powered systems have gained widespread popularity in various domains,
including Autonomous Vehicles (AVs). However, ensuring their reliability and
safety is challenging due to their complex nature. Conventional test adequacy
metrics, designed to evaluate the effectiveness of traditional software
testing, are often insufficient or impractical for these systems. White-box
metrics, which are specifically designed for these systems, leverage neuron
coverage information. These coverage metrics necessitate access to the
underlying AI model and training data, which may not always be available.
Furthermore, the existing adequacy metrics exhibit weak correlations with the
ability to detect faults in the generated test suite, creating a gap that we
aim to bridge in this study.
In this paper, we introduce a set of black-box test adequacy metrics called
"Test suite Instance Space Adequacy" (TISA) metrics, which can be used to gauge
the effectiveness of a test suite. The TISA metrics offer a way to assess both
the diversity and coverage of the test suite and the range of bugs detected
during testing. Additionally, we introduce a framework that permits testers to
visualise the diversity and coverage of the test suite in a two-dimensional
space, facilitating the identification of areas that require improvement.
We evaluate the efficacy of the TISA metrics by examining their correlation
with the number of bugs detected in system-level simulation testing of AVs. A
strong correlation, coupled with the short computation time, indicates their
effectiveness and efficiency in estimating the adequacy of testing AVs.
Related papers
- Adaptive Testing for LLM-Based Applications: A Diversity-based Approach [15.33985438101206]
We show that diversity-based testing techniques, such as Adaptive Random Testing (ART), can be effectively applied to the testing of prompt templates.
Our results, obtained using various implementations that explore several string-based distances, confirm that our approach enables the discovery of failures with reduced testing budgets.
arXiv Detail & Related papers (2025-01-23T08:53:12Z) - AI-Compass: A Comprehensive and Effective Multi-module Testing Tool for AI Systems [26.605694684145313]
In this study, we design and implement a testing tool, tool, to comprehensively and effectively evaluate AI systems.
The tool extensively assesses adversarial robustness, model interpretability, and performs neuron analysis.
Our research sheds light on a general solution for AI systems testing landscape.
arXiv Detail & Related papers (2024-11-09T11:15:17Z) - Context-Aware Testing: A New Paradigm for Model Testing with Large Language Models [49.06068319380296]
We introduce context-aware testing (CAT) which uses context as an inductive bias to guide the search for meaningful model failures.
We instantiate the first CAT system, SMART Testing, which employs large language models to hypothesize relevant and likely failures.
arXiv Detail & Related papers (2024-10-31T15:06:16Z) - Scalable Similarity-Aware Test Suite Minimization with Reinforcement Learning [6.9290255098776425]
TripRL is a novel technique to produce a diverse reduced test suite with high test effectiveness.
We show that TripRL's runtime scales linearly with the magnitude of the Multi-Criteria Test Suite Minimization problem.
arXiv Detail & Related papers (2024-08-24T08:43:03Z) - Active Test-Time Adaptation: Theoretical Analyses and An Algorithm [51.84691955495693]
Test-time adaptation (TTA) addresses distribution shifts for streaming test data in unsupervised settings.
We propose the novel problem setting of active test-time adaptation (ATTA) that integrates active learning within the fully TTA setting.
arXiv Detail & Related papers (2024-04-07T22:31:34Z) - On the Universal Adversarial Perturbations for Efficient Data-free
Adversarial Detection [55.73320979733527]
We propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs.
Experimental results show that our method achieves competitive detection performance on various text classification tasks.
arXiv Detail & Related papers (2023-06-27T02:54:07Z) - From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing.
This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time.
We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts [117.72709110877939]
Test-time adaptation (TTA) has the potential to adapt a pre-trained model to unlabeled data during testing, before making predictions.
We categorize TTA into several distinct groups based on the form of test data, namely, test-time domain adaptation, test-time batch adaptation, and online test-time adaptation.
arXiv Detail & Related papers (2023-03-27T16:32:21Z) - Identifying and Explaining Safety-critical Scenarios for Autonomous
Vehicles via Key Features [5.634825161148484]
This paper uses Instance Space Analysis (ISA) to identify the significant features of test scenarios that affect their ability to reveal the unsafe behaviour of AVs.
ISA identifies the features that best differentiate safety-critical scenarios from normal driving and visualises the impact of these features on test scenario outcomes (safe/unsafe) in 2D.
To test the predictive ability of the identified features, we train five Machine Learning classifiers to classify test scenarios as safe or unsafe.
arXiv Detail & Related papers (2022-12-15T00:52:47Z) - Complete Agent-driven Model-based System Testing for Autonomous Systems [0.0]
A novel approach to testing complex autonomous transportation systems is described.
It is intended to mitigate some of the most critical problems regarding verification and validation.
arXiv Detail & Related papers (2021-10-25T01:55:24Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.