Test Generation Strategies for Building Failure Models and Explaining
Spurious Failures
- URL: http://arxiv.org/abs/2312.05631v1
- Date: Sat, 9 Dec 2023 18:36:15 GMT
- Title: Test Generation Strategies for Building Failure Models and Explaining
Spurious Failures
- Authors: Baharin Aliashrafi Jodat, Abhishek Chandar, Shiva Nejati, Mehrdad
- Abstract summary: Test inputs fail not only when the system under test is faulty but also when the inputs are invalid or unrealistic.
We propose to build failure models for inferring interpretable rules on test inputs that cause spurious failures.
We show that our proposed surrogate-assisted approach generates failure models with an average accuracy of 83%.
- Score: 4.995172162560306
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Test inputs fail not only when the system under test is faulty but also when
the inputs are invalid or unrealistic. Failures resulting from invalid or
unrealistic test inputs are spurious. Avoiding spurious failures improves the
effectiveness of testing in exercising the main functions of a system,
particularly for compute-intensive (CI) systems where a single test execution
takes significant time. In this paper, we propose to build failure models for
inferring interpretable rules on test inputs that cause spurious failures. We
examine two alternative strategies for building failure models: (1) machine
learning (ML)-guided test generation and (2) surrogate-assisted test
generation. ML-guided test generation infers boundary regions that separate
passing and failing test inputs and samples test inputs from those regions.
Surrogate-assisted test generation relies on surrogate models to predict labels
for test inputs instead of exercising all the inputs. We propose a novel
surrogate-assisted algorithm that uses multiple surrogate models
simultaneously, and dynamically selects the prediction from the most accurate
model. We empirically evaluate the accuracy of failure models inferred based on
surrogate-assisted and ML-guided test generation algorithms. Using case studies
from the domains of cyber-physical systems and networks, we show that our
proposed surrogate-assisted approach generates failure models with an average
accuracy of 83%, significantly outperforming ML-guided test generation and two
baselines. Further, our approach learns failure-inducing rules that identify
genuine spurious failures as validated against domain knowledge.
Related papers
- Context-Aware Testing: A New Paradigm for Model Testing with Large Language Models [49.06068319380296]
We introduce context-aware testing (CAT) which uses context as an inductive bias to guide the search for meaningful model failures.
We instantiate the first CAT system, SMART Testing, which employs large language models to hypothesize relevant and likely failures.
arXiv Detail & Related papers (2024-10-31T15:06:16Z) - Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting [55.17761802332469]
Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and test data by adapting a given model w.r.t. any test sample.
Prior methods perform backpropagation for each test sample, resulting in unbearable optimization costs to many applications.
We propose an Efficient Anti-Forgetting Test-Time Adaptation (EATA) method which develops an active sample selection criterion to identify reliable and non-redundant samples.
arXiv Detail & Related papers (2024-03-18T05:49:45Z) - Deep anytime-valid hypothesis testing [29.273915933729057]
We propose a general framework for constructing powerful, sequential hypothesis tests for nonparametric testing problems.
We develop a principled approach of leveraging the representation capability of machine learning models within the testing-by-betting framework.
Empirical results on synthetic and real-world datasets demonstrate that tests instantiated using our general framework are competitive against specialized baselines.
arXiv Detail & Related papers (2023-10-30T09:46:19Z) - CAFA: Class-Aware Feature Alignment for Test-Time Adaptation [50.26963784271912]
Test-time adaptation (TTA) aims to address this challenge by adapting a model to unlabeled data at test time.
We propose a simple yet effective feature alignment loss, termed as Class-Aware Feature Alignment (CAFA), which simultaneously encourages a model to learn target representations in a class-discriminative manner.
arXiv Detail & Related papers (2022-06-01T03:02:07Z) - TTAPS: Test-Time Adaption by Aligning Prototypes using Self-Supervision [70.05605071885914]
We propose a novel modification of the self-supervised training algorithm SwAV that adds the ability to adapt to single test samples.
We show the success of our method on the common benchmark dataset CIFAR10-C.
arXiv Detail & Related papers (2022-05-18T05:43:06Z) - Machine Learning Testing in an ADAS Case Study Using
Simulation-Integrated Bio-Inspired Search-Based Testing [7.5828169434922]
Deeper generates failure-revealing test scenarios for testing a deep neural network-based lane-keeping system.
In the newly proposed version, we utilize a new set of bio-inspired search algorithms, genetic algorithm (GA), $(mu+lambda)$ and $(mu,lambda)$ evolution strategies (ES), and particle swarm optimization (PSO)
Our evaluation shows the newly proposed test generators in Deeper represent a considerable improvement on the previous version.
arXiv Detail & Related papers (2022-03-22T20:27:40Z) - Learn then Test: Calibrating Predictive Algorithms to Achieve Risk
Control [67.52000805944924]
Learn then Test (LTT) is a framework for calibrating machine learning models.
Our main insight is to reframe the risk-control problem as multiple hypothesis testing.
We use our framework to provide new calibration methods for several core machine learning tasks with detailed worked examples in computer vision.
arXiv Detail & Related papers (2021-10-03T17:42:03Z) - Efficient and Effective Generation of Test Cases for Pedestrian
Detection -- Search-based Software Testing of Baidu Apollo in SVL [14.482670650074885]
This paper presents a study on testing pedestrian detection and emergency braking system of the Baidu Apollo autonomous driving platform within the SVL simulator.
We propose an evolutionary automated test generation technique that generates failure-revealing scenarios for Apollo in the SVL environment.
In order to demonstrate the efficiency and effectiveness of our approach, we also report the results from a baseline random generation technique.
arXiv Detail & Related papers (2021-09-16T13:11:53Z) - Distribution-Aware Testing of Neural Networks Using Generative Models [5.618419134365903]
The reliability of software that has a Deep Neural Network (DNN) as a component is urgently important.
We show that three recent testing techniques generate significant number of invalid test inputs.
We propose a technique to incorporate the valid input space of the DNN model under test in the test generation process.
arXiv Detail & Related papers (2021-02-26T17:18:21Z) - Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle.
In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize.
Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.