Using Causal Inference to Test Systems with Hidden and Interacting Variables: An Evaluative Case Study
- URL: http://arxiv.org/abs/2504.16526v2
- Date: Fri, 25 Apr 2025 13:32:24 GMT
- Title: Using Causal Inference to Test Systems with Hidden and Interacting Variables: An Evaluative Case Study
- Authors: Michael Foster, Robert M. Hierons, Donghwan Shin, Neil Walkinshaw, Christopher Wild,
- Abstract summary: Software systems with large parameter spaces, nondeterminism and high computational cost are challenging to test.<n>Recent software testing techniques based on causal inference have been successfully applied to systems that exhibit such characteristics.
- Score: 2.1146241717926664
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Software systems with large parameter spaces, nondeterminism and high computational cost are challenging to test. Recently, software testing techniques based on causal inference have been successfully applied to systems that exhibit such characteristics, including scientific models and autonomous driving systems. One significant limitation is that these are restricted to test properties where all of the variables involved can be observed and where there are no interactions between variables. In practice, this is rarely guaranteed; the logging infrastructure may not be available to record all of the necessary runtime variable values, and it can often be the case that an output of the system can be affected by complex interactions between variables. To address this, we leverage two additional concepts from causal inference, namely effect modification and instrumental variable methods. We build these concepts into an existing causal testing tool and conduct an evaluative case study which uses the concepts to test three system-level requirements of CARLA, a high-fidelity driving simulator widely used in autonomous vehicle development and testing. The results show that we can obtain reliable test outcomes without requiring large amounts of highly controlled test data or instrumentation of the code, even when variables interact with each other and are not recorded in the test data.
Related papers
- On the Mistaken Assumption of Interchangeable Deep Reinforcement Learning Implementations [53.0667196725616]
Deep Reinforcement Learning (DRL) is a paradigm of artificial intelligence where an agent uses a neural network to learn which actions to take in a given environment.<n>DRL has recently gained traction from being able to solve complex environments like driving simulators, 3D robotic control, and multiplayer-online-battle-arena video games.<n>Numerous implementations of the state-of-the-art algorithms responsible for training these agents, like the Deep Q-Network (DQN) and Proximal Policy Optimization (PPO) algorithms, currently exist.
arXiv Detail & Related papers (2025-03-28T16:25:06Z) - The Causal Chambers: Real Physical Systems as a Testbed for AI Methodology [10.81691411087626]
In some fields of AI, machine learning and statistics, the validation of new methods and algorithms is often hindered by the scarcity of suitable real-world datasets.
We have constructed two devices that allow us to quickly and inexpensively produce large datasets from non-trivial but well-understood physical systems.
arXiv Detail & Related papers (2024-04-17T13:00:52Z) - Towards Reliable AI: Adequacy Metrics for Ensuring the Quality of
System-level Testing of Autonomous Vehicles [5.634825161148484]
We introduce a set of black-box test adequacy metrics called "Test suite Instance Space Adequacy" (TISA) metrics.
The TISA metrics offer a way to assess both the diversity and coverage of the test suite and the range of bugs detected during testing.
We evaluate the efficacy of the TISA metrics by examining their correlation with the number of bugs detected in system-level simulation testing of AVs.
arXiv Detail & Related papers (2023-11-14T10:16:05Z) - Deep anytime-valid hypothesis testing [29.273915933729057]
We propose a general framework for constructing powerful, sequential hypothesis tests for nonparametric testing problems.
We develop a principled approach of leveraging the representation capability of machine learning models within the testing-by-betting framework.
Empirical results on synthetic and real-world datasets demonstrate that tests instantiated using our general framework are competitive against specialized baselines.
arXiv Detail & Related papers (2023-10-30T09:46:19Z) - Asynchronous Integration of Real-Time Simulators for HIL-based
Validation of Smart Grids [0.08796261172196743]
This paper explores the possibilities that are opened in terms of testing by the integration of a real-time simulator into co-simulation environments.
Smart grid applications would typically include a relatively large number of physical devices, software components, as well as communication technology, all working hand in hand.
arXiv Detail & Related papers (2023-09-14T11:44:21Z) - Towards Automatic Generation of Amplified Regression Test Oracles [44.45138073080198]
We propose a test oracle derivation approach to amplify regression test oracles.
The approach monitors the object state during test execution and compares it to the previous version to detect any changes in relation to the SUT's intended behaviour.
arXiv Detail & Related papers (2023-07-28T12:38:44Z) - From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing.
This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time.
We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - Interactive System-wise Anomaly Detection [66.3766756452743]
Anomaly detection plays a fundamental role in various applications.
It is challenging for existing methods to handle the scenarios where the instances are systems whose characteristics are not readily observed as data.
We develop an end-to-end approach which includes an encoder-decoder module that learns system embeddings.
arXiv Detail & Related papers (2023-04-21T02:20:24Z) - Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled
Systems [0.6690874707758508]
Deep Neural Networks (DNNs) have been widely used to perform real-world tasks in cyber-physical systems such as Autonomous Diving Systems (ADS)
Ensuring the correct behavior of such DNN-Enabled Systems (DES) is a crucial topic.
Online testing is one of the promising modes for testing such systems with their application environments (simulated or real) in a closed loop.
We present MORLOT, a novel online testing approach to address these challenges by combining Reinforcement Learning (RL) and many-objective search.
arXiv Detail & Related papers (2022-10-27T13:53:37Z) - Improving the Performance of Robust Control through Event-Triggered
Learning [74.57758188038375]
We propose an event-triggered learning algorithm that decides when to learn in the face of uncertainty in the LQR problem.
We demonstrate improved performance over a robust controller baseline in a numerical example.
arXiv Detail & Related papers (2022-07-28T17:36:37Z) - Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement
Learning Framework [68.96770035057716]
A/B testing is a business strategy to compare a new product with an old one in pharmaceutical, technological, and traditional industries.
This paper introduces a reinforcement learning framework for carrying A/B testing in online experiments.
arXiv Detail & Related papers (2020-02-05T10:25:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.