Machine Learning Testing in an ADAS Case Study Using
Simulation-Integrated Bio-Inspired Search-Based Testing
- URL: http://arxiv.org/abs/2203.12026v4
- Date: Wed, 7 Jun 2023 09:24:31 GMT
- Title: Machine Learning Testing in an ADAS Case Study Using
Simulation-Integrated Bio-Inspired Search-Based Testing
- Authors: Mahshid Helali Moghadam, Markus Borg, Mehrdad Saadatmand, Seyed
Jalaleddin Mousavirad, Markus Bohlin, Bj\"orn Lisper
- Abstract summary: Deeper generates failure-revealing test scenarios for testing a deep neural network-based lane-keeping system.
In the newly proposed version, we utilize a new set of bio-inspired search algorithms, genetic algorithm (GA), $(mu+lambda)$ and $(mu,lambda)$ evolution strategies (ES), and particle swarm optimization (PSO)
Our evaluation shows the newly proposed test generators in Deeper represent a considerable improvement on the previous version.
- Score: 7.5828169434922
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents an extended version of Deeper, a search-based
simulation-integrated test solution that generates failure-revealing test
scenarios for testing a deep neural network-based lane-keeping system. In the
newly proposed version, we utilize a new set of bio-inspired search algorithms,
genetic algorithm (GA), $({\mu}+{\lambda})$ and $({\mu},{\lambda})$ evolution
strategies (ES), and particle swarm optimization (PSO), that leverage a quality
population seed and domain-specific cross-over and mutation operations tailored
for the presentation model used for modeling the test scenarios. In order to
demonstrate the capabilities of the new test generators within Deeper, we carry
out an empirical evaluation and comparison with regard to the results of five
participating tools in the cyber-physical systems testing competition at SBST
2021. Our evaluation shows the newly proposed test generators in Deeper not
only represent a considerable improvement on the previous version but also
prove to be effective and efficient in provoking a considerable number of
diverse failure-revealing test scenarios for testing an ML-driven lane-keeping
system. They can trigger several failures while promoting test scenario
diversity, under a limited test time budget, high target failure severity, and
strict speed limit constraints.
Related papers
- Many-Objective Search-Based Coverage-Guided Automatic Test Generation for Deep Neural Networks [2.141511605027007]
This paper proposes a fuzzing test generation technique based on many-objective optimization algorithms.
The frequency-based fuzz sampling strategy assigns priorities based on the frequency of selection of initial data.
A local search strategy based on the Monte Carlo tree search is proposed to enhance the algorithm's local search capabilities.
arXiv Detail & Related papers (2024-11-01T21:08:15Z) - Diversity-guided Search Exploration for Self-driving Cars Test
Generation through Frenet Space Encoding [4.135985106933988]
The rise of self-driving cars (SDCs) presents important safety challenges to address in dynamic environments.
While field testing is essential, current methods lack diversity in assessing critical SDC scenarios.
We show that the likelihood of leading to an out-of-bound condition can be learned by the deep-learning vanilla transformer model.
arXiv Detail & Related papers (2024-01-26T06:57:00Z) - Test Generation Strategies for Building Failure Models and Explaining
Spurious Failures [4.995172162560306]
Test inputs fail not only when the system under test is faulty but also when the inputs are invalid or unrealistic.
We propose to build failure models for inferring interpretable rules on test inputs that cause spurious failures.
We show that our proposed surrogate-assisted approach generates failure models with an average accuracy of 83%.
arXiv Detail & Related papers (2023-12-09T18:36:15Z) - From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing.
This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time.
We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - TeST: Test-time Self-Training under Distribution Shift [99.68465267994783]
Test-Time Self-Training (TeST) is a technique that takes as input a model trained on some source data and a novel data distribution at test time.
We find that models adapted using TeST significantly improve over baseline test-time adaptation algorithms.
arXiv Detail & Related papers (2022-09-23T07:47:33Z) - TTAPS: Test-Time Adaption by Aligning Prototypes using Self-Supervision [70.05605071885914]
We propose a novel modification of the self-supervised training algorithm SwAV that adds the ability to adapt to single test samples.
We show the success of our method on the common benchmark dataset CIFAR10-C.
arXiv Detail & Related papers (2022-05-18T05:43:06Z) - Efficient Test-Time Model Adaptation without Forgetting [60.36499845014649]
Test-time adaptation seeks to tackle potential distribution shifts between training and testing data.
We propose an active sample selection criterion to identify reliable and non-redundant samples.
We also introduce a Fisher regularizer to constrain important model parameters from drastic changes.
arXiv Detail & Related papers (2022-04-06T06:39:40Z) - Listen, Adapt, Better WER: Source-free Single-utterance Test-time
Adaptation for Automatic Speech Recognition [65.84978547406753]
Test-time Adaptation aims to adapt the model trained on source domains to yield better predictions for test samples.
Single-Utterance Test-time Adaptation (SUTA) is the first TTA study in speech area to our best knowledge.
arXiv Detail & Related papers (2022-03-27T06:38:39Z) - Efficient and Effective Generation of Test Cases for Pedestrian
Detection -- Search-based Software Testing of Baidu Apollo in SVL [14.482670650074885]
This paper presents a study on testing pedestrian detection and emergency braking system of the Baidu Apollo autonomous driving platform within the SVL simulator.
We propose an evolutionary automated test generation technique that generates failure-revealing scenarios for Apollo in the SVL environment.
In order to demonstrate the efficiency and effectiveness of our approach, we also report the results from a baseline random generation technique.
arXiv Detail & Related papers (2021-09-16T13:11:53Z) - Online GANs for Automatic Performance Testing [0.10312968200748115]
We present a novel algorithm for automatic performance testing that uses an online variant of the Generative Adversarial Network (GAN)
The proposed approach does not require a prior training set or model of the system under test.
We consider that the presented algorithm serves as a proof of concept and we hope that it can spark a research discussion on the application of GANs to test generation.
arXiv Detail & Related papers (2021-04-21T06:03:27Z) - Noisy Adaptive Group Testing using Bayesian Sequential Experimental
Design [63.48989885374238]
When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually.
Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting.
arXiv Detail & Related papers (2020-04-26T23:41:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.