Automated Test Oracles for Flaky Cyber-Physical System Simulators: Approach and Evaluation
- URL: http://arxiv.org/abs/2508.20902v1
- Date: Thu, 28 Aug 2025 15:33:42 GMT
- Title: Automated Test Oracles for Flaky Cyber-Physical System Simulators: Approach and Evaluation
- Authors: Baharin A. Jodat, Khouloud Gaaloul, Mehrdad Sabetzadeh, Shiva Nejati,
- Abstract summary: Simulation-based testing of cyber-physical systems (CPS) is costly due to the time-consuming execution of CPS simulators.<n>CPS simulators may be flaky, leading to inconsistent test outcomes and requiring repeated test re-execution for reliable test verdicts.<n>We propose assertion-based test oracles for CPS as sets of logical and arithmetic predicates defined over the inputs of the system under test.
- Score: 1.5821080783312833
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Simulation-based testing of cyber-physical systems (CPS) is costly due to the time-consuming execution of CPS simulators. In addition, CPS simulators may be flaky, leading to inconsistent test outcomes and requiring repeated test re-execution for reliable test verdicts. Automated test oracles that do not require system execution are therefore crucial for reducing testing costs. Ideally, such test oracles should be interpretable to facilitate human understanding of test verdicts, and they must be robust against the potential flakiness of CPS simulators. In this article, we propose assertion-based test oracles for CPS as sets of logical and arithmetic predicates defined over the inputs of the system under test. Given a test input, our assertion-based test oracle determines, without requiring test execution, whether the test passes, fails, or if the oracle is inconclusive in predicting a verdict. We describe two methods for generating assertion-based test oracles: one using genetic programming~(GP) that employs well-known spectrum-based fault localization (SBFL) ranking formulas, namely Ochiai, Tarantula, and Naish, as fitness functions; and the other using decision trees (DT) and decision rules (DR). We evaluate our assertion-based test oracles through case studies in the domains of aerospace, networking and autonomous driving. We show that test oracles generated using GP with Ochiai are significantly more accurate than those obtained using GP with Tarantula and Naish or using DT or DR. Moreover, this accuracy advantage remains even when accounting for the flakiness of the system under test. We further show that the assertion-based test oracles generated by GP with Ochiai are robust against flakiness with only 4% average variation in their accuracy results across four different network and autonomous driving systems with flaky behaviours.
Related papers
- TestAgent: An Adaptive and Intelligent Expert for Human Assessment [62.060118490577366]
We propose TestAgent, a large language model (LLM)-powered agent designed to enhance adaptive testing through interactive engagement.<n>TestAgent supports personalized question selection, captures test-takers' responses and anomalies, and provides precise outcomes through dynamic, conversational interactions.
arXiv Detail & Related papers (2025-06-03T16:07:54Z) - RBT4DNN: Requirements-based Testing of Neural Networks [16.90562395404293]
Deep neural network (DNN) testing is crucial for the reliability and safety of critical systems, where failures can have severe consequences.<n>We propose a requirements-based test suite generation method that uses structured natural language requirements formulated in a semantic feature space to create test suites.<n>Our experiments on the MNIST, CelebA-HQ, ImageNet, and autonomous car driving datasets demonstrate that the generated test suites are realistic, diverse, consistent with preconditions, and capable of revealing faults.
arXiv Detail & Related papers (2025-04-03T16:24:49Z) - Do Test and Environmental Complexity Increase Flakiness? An Empirical Study of SAP HANA [47.29324864511411]
Flaky tests fail seemingly at random without changes to the code.
We study characteristics of tests and the test environment that potentially impact test flakiness.
arXiv Detail & Related papers (2024-09-16T07:52:09Z) - FlaKat: A Machine Learning-Based Categorization Framework for Flaky
Tests [3.0846824529023382]
Flaky tests can pass or fail non-deterministically, without alterations to a software system.
State-of-the-art research incorporates machine learning solutions into flaky test detection and achieves reasonably good accuracy.
arXiv Detail & Related papers (2024-03-01T22:00:44Z) - Test Generation Strategies for Building Failure Models and Explaining
Spurious Failures [4.995172162560306]
Test inputs fail not only when the system under test is faulty but also when the inputs are invalid or unrealistic.
We propose to build failure models for inferring interpretable rules on test inputs that cause spurious failures.
We show that our proposed surrogate-assisted approach generates failure models with an average accuracy of 83%.
arXiv Detail & Related papers (2023-12-09T18:36:15Z) - Evaluating the Impact of Flaky Simulators on Testing Autonomous Driving
Systems [2.291478393584594]
We investigate test flakiness in simulation-based testing of Autonomous Driving Systems (ADS)
We show that test flakiness in ADS is a common occurrence and can significantly impact the test results obtained by randomized algorithms.
Our machine learning (ML) classifiers effectively identify flaky ADS tests using only a single test run.
arXiv Detail & Related papers (2023-11-30T18:08:02Z) - Precise Error Rates for Computationally Efficient Testing [67.30044609837749]
We revisit the question of simple-versus-simple hypothesis testing with an eye towards computational complexity.<n>An existing test based on linear spectral statistics achieves the best possible tradeoff curve between type I and type II error rates.
arXiv Detail & Related papers (2023-11-01T04:41:16Z) - Test Case Generation and Test Oracle Support for Testing CPSs using
Hybrid Models [2.6166087473624313]
Cyber-Physical Systems (CPSs) play a central role in the behavior of a wide range of autonomous physical systems.
CPSs are often specified iteratively as a sequence of models at different levels that can be tested via simulation systems.
One such model is a hybrid automaton; these are used frequently for CPS applications and have the advantage of encapsulating both continuous and discrete CPS behaviors.
arXiv Detail & Related papers (2023-09-14T19:08:09Z) - Near-Optimal Non-Parametric Sequential Tests and Confidence Sequences
with Possibly Dependent Observations [44.71254888821376]
We provide the first type-I-error and expected-rejection-time guarantees under general non-data generating processes.
We show how to apply our results to inference on parameters defined by estimating equations, such as average treatment effects.
arXiv Detail & Related papers (2022-12-29T18:37:08Z) - Sequential Kernelized Independence Testing [77.237958592189]
We design sequential kernelized independence tests inspired by kernelized dependence measures.<n>We demonstrate the power of our approaches on both simulated and real data.
arXiv Detail & Related papers (2022-12-14T18:08:42Z) - Machine Learning Testing in an ADAS Case Study Using
Simulation-Integrated Bio-Inspired Search-Based Testing [7.5828169434922]
Deeper generates failure-revealing test scenarios for testing a deep neural network-based lane-keeping system.
In the newly proposed version, we utilize a new set of bio-inspired search algorithms, genetic algorithm (GA), $(mu+lambda)$ and $(mu,lambda)$ evolution strategies (ES), and particle swarm optimization (PSO)
Our evaluation shows the newly proposed test generators in Deeper represent a considerable improvement on the previous version.
arXiv Detail & Related papers (2022-03-22T20:27:40Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z) - Noisy Adaptive Group Testing using Bayesian Sequential Experimental
Design [63.48989885374238]
When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually.
Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting.
arXiv Detail & Related papers (2020-04-26T23:41:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.