Related papers: Generalized Coverage Criteria for Combinatorial Sequence Testing

Generalized Coverage Criteria for Combinatorial Sequence Testing

URL: http://arxiv.org/abs/2201.00522v4
Date: Tue, 31 Oct 2023 07:34:39 GMT
Title: Generalized Coverage Criteria for Combinatorial Sequence Testing
Authors: Achiya Elyasaf, Eitan Farchi, Oded Margalit, Gera Weiss, Yeshayahu Weiss
Abstract summary: We present a new model-based approach for testing systems that use sequences of actions and assertions as test vectors. Our solution includes a method for quantifying testing quality, a tool for generating high-quality test suites based on the coverage criteria we propose, and a framework for assessing risks.
Score: 4.807321976136717
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a new model-based approach for testing systems that use sequences of actions and assertions as test vectors. Our solution includes a method for quantifying testing quality, a tool for generating high-quality test suites based on the coverage criteria we propose, and a framework for assessing risks. For testing quality, we propose a method that specifies generalized coverage criteria over sequences of actions, which extends previous approaches. Our publicly available tool demonstrates how to extract effective test suites from test plans based on these criteria. We also present a Bayesian approach for measuring the probabilities of bugs or risks, and show how this quantification can help achieve an informed balance between exploitation and exploration in testing. Finally, we provide an empirical evaluation demonstrating the effectiveness of our tool in finding bugs, assessing risks, and achieving coverage.

Related papers

Adaptive Testing for LLM-Based Applications: A Diversity-based Approach [15.33985438101206]
We show that diversity-based testing techniques, such as Adaptive Random Testing (ART), can be effectively applied to the testing of prompt templates. Our results, obtained using various implementations that explore several string-based distances, confirm that our approach enables the discovery of failures with reduced testing budgets.
arXiv Detail & Related papers (2025-01-23T08:53:12Z)
Human-Calibrated Automated Testing and Validation of Generative Language Models [3.2855317710497633]
This paper introduces a comprehensive framework for the evaluation and validation of generative language models (GLMs) It focuses on Retrieval-Augmented Generation (RAG) systems deployed in high-stakes domains such as banking.
arXiv Detail & Related papers (2024-11-25T13:53:36Z)
Beyond ELBOs: A Large-Scale Evaluation of Variational Methods for Sampling [14.668634411361307]
We introduce a benchmark that evaluates sampling methods using a standardized task suite and a broad range of performance criteria. We study existing metrics for quantifying mode collapse and introduce novel metrics for this purpose.
arXiv Detail & Related papers (2024-06-11T16:23:33Z)
Backdoor-based Explainable AI Benchmark for High Fidelity Evaluation of Attribution Methods [49.62131719441252]
Attribution methods compute importance scores for input features to explain the output predictions of deep models. In this work, we first identify a set of fidelity criteria that reliable benchmarks for attribution methods are expected to fulfill. We then introduce a Backdoor-based eXplainable AI benchmark (BackX) that adheres to the desired fidelity criteria.
arXiv Detail & Related papers (2024-05-02T13:48:37Z)
Few-Shot Scenario Testing for Autonomous Vehicles Based on Neighborhood Coverage and Similarity [8.97909097472183]
Testing and evaluating safety performance of autonomous vehicles (AVs) is essential before the large-scale deployment. The number of testing scenarios permissible for a specific AV is severely limited by tight constraints on testing budgets and time. We formulate this problem for the first time the "few-shot testing" (FST) problem and propose a systematic framework to address this challenge.
arXiv Detail & Related papers (2024-02-02T04:47:14Z)
Measuring Software Testability via Automatically Generated Test Cases [8.17364116624769]
We propose a new approach to pursuing testability measurements based on software metrics. Our approach exploits automatic test generation and mutation analysis to quantify the evidence about the relative hardness of developing effective test cases.
arXiv Detail & Related papers (2023-07-30T09:48:51Z)
From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing. This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time. We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z)
Uncertainty-Driven Action Quality Assessment [67.20617610820857]
We propose a novel probabilistic model, named Uncertainty-Driven AQA (UD-AQA), to capture the diversity among multiple judge scores. We generate the estimation of uncertainty for each prediction, which is employed to re-weight AQA regression loss. Our proposed method achieves competitive results on three benchmarks including the Olympic events MTL-AQA and FineDiving, and the surgical skill JIGSAWS datasets.
arXiv Detail & Related papers (2022-07-29T07:21:15Z)
Risk Consistent Multi-Class Learning from Label Proportions [64.0125322353281]
This study addresses a multiclass learning from label proportions (MCLLP) setting in which training instances are provided in bags. Most existing MCLLP methods impose bag-wise constraints on the prediction of instances or assign them pseudo-labels. A risk-consistent method is proposed for instance classification using the empirical risk minimization framework.
arXiv Detail & Related papers (2022-03-24T03:49:04Z)
Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control [67.52000805944924]
Learn then Test (LTT) is a framework for calibrating machine learning models. Our main insight is to reframe the risk-control problem as multiple hypothesis testing. We use our framework to provide new calibration methods for several core machine learning tasks with detailed worked examples in computer vision.
arXiv Detail & Related papers (2021-10-03T17:42:03Z)
Group Testing with Non-identical Infection Probabilities [59.96266198512243]
We develop an adaptive group testing algorithm using the set formation method. We show that our algorithm outperforms the state of the art, and performs close to the entropy lower bound.
arXiv Detail & Related papers (2021-08-27T17:53:25Z)
Feedback Effects in Repeat-Use Criminal Risk Assessments [0.0]
We show that risk can propagate over sequential decisions in ways that are not captured by one-shot tests. Risk assessment tools operate in a highly complex and path-dependent process, fraught with historical inequity.
arXiv Detail & Related papers (2020-11-28T06:40:05Z)
SAMBA: Safe Model-Based & Active Reinforcement Learning [59.01424351231993]
SAMBA is a framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations. We provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.
arXiv Detail & Related papers (2020-06-12T10:40:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.