Related papers: Scalable Similarity-Aware Test Suite Minimization with Reinforcement Learning

Scalable Similarity-Aware Test Suite Minimization with Reinforcement Learning

URL: http://arxiv.org/abs/2408.13517v2
Date: Wed, 15 Jan 2025 14:36:05 GMT
Title: Scalable Similarity-Aware Test Suite Minimization with Reinforcement Learning
Authors: Sijia Gu, Ali Mesbah,
Abstract summary: TripRL is a novel technique to produce a diverse reduced test suite with high test effectiveness.<n>We show that TripRL's runtime scales linearly with the magnitude of the Multi-Criteria Test Suite Minimization problem.
Score: 6.9290255098776425
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Multi-Criteria Test Suite Minimization (MCTSM) problem aims to remove redundant test cases, guided by adequacy criteria such as code coverage or fault detection capability. However, current techniques either exhibit a high loss of fault detection ability or face scalability challenges due to the NP-hard nature of the problem, which limits their practical utility. We propose TripRL, a novel technique that integrates traditional criteria such as statement coverage and fault detection ability with test coverage similarity into an Integer Linear Program (ILP), to produce a diverse reduced test suite with high test effectiveness. TripRL leverages bipartite graph representation and its embedding for concise ILP formulation and combines ILP with effective reinforcement learning (RL) training. This combination renders large-scale test suite minimization more scalable and enhances test effectiveness. Our empirical evaluations demonstrate that TripRL's runtime scales linearly with the magnitude of the MCTSM problem. Notably, for large test suites from the Defects4j dataset where existing approaches fail to provide solutions within a reasonable time frame, our technique consistently delivers solutions in less than 47 minutes. The reduced test suites produced by TripRL also maintain the original statement coverage and fault detection ability while having a higher potential to detect unknown faults.

Related papers

Improving Deep Learning Framework Testing with Model-Level Metamorphic Testing [19.880543046739252]
Deep learning (DL) frameworks are essential to DL-based software systems, and framework bugs may lead to substantial disasters.<n>Researchers adopt DL models or single interfaces as test inputs and analyze their execution results to detect bugs.<n> floating-point errors, inherent randomness, and the complexity of test inputs make it challenging to analyze execution results effectively.
arXiv Detail & Related papers (2025-07-06T11:38:14Z)
Requirements Coverage-Guided Minimization for Natural Language Test Cases [7.947774587906927]
Test suites tend to grow in size and often contain redundant test cases.<n>Test suite minimization aims to eliminate such redundancy while preserving key properties such as requirement coverage and fault detection capability.<n>We propose RTM (Requirement coverage-guided Test suite Minimization), a novel TSM approach designed for requirement-based testing.
arXiv Detail & Related papers (2025-05-26T13:55:33Z)
Improving LLM-based Unit test generation via Template-based Repair [8.22619177301814]
Unit test is crucial for detecting bugs in individual program units but consumes time and effort. Large language models (LLMs) have demonstrated remarkable reasoning and generation capabilities. In this paper, we propose TestART, a novel unit test generation method.
arXiv Detail & Related papers (2024-08-06T10:52:41Z)
Active Test-Time Adaptation: Theoretical Analyses and An Algorithm [51.84691955495693]
Test-time adaptation (TTA) addresses distribution shifts for streaming test data in unsupervised settings. We propose the novel problem setting of active test-time adaptation (ATTA) that integrates active learning within the fully TTA setting.
arXiv Detail & Related papers (2024-04-07T22:31:34Z)
Test Generation Strategies for Building Failure Models and Explaining Spurious Failures [4.995172162560306]
Test inputs fail not only when the system under test is faulty but also when the inputs are invalid or unrealistic. We propose to build failure models for inferring interpretable rules on test inputs that cause spurious failures. We show that our proposed surrogate-assisted approach generates failure models with an average accuracy of 83%.
arXiv Detail & Related papers (2023-12-09T18:36:15Z)
Towards Reliable AI: Adequacy Metrics for Ensuring the Quality of System-level Testing of Autonomous Vehicles [5.634825161148484]
We introduce a set of black-box test adequacy metrics called "Test suite Instance Space Adequacy" (TISA) metrics. The TISA metrics offer a way to assess both the diversity and coverage of the test suite and the range of bugs detected during testing. We evaluate the efficacy of the TISA metrics by examining their correlation with the number of bugs detected in system-level simulation testing of AVs.
arXiv Detail & Related papers (2023-11-14T10:16:05Z)
Precise Error Rates for Computationally Efficient Testing [75.63895690909241]
We revisit the question of simple-versus-simple hypothesis testing with an eye towards computational complexity. An existing test based on linear spectral statistics achieves the best possible tradeoff curve between type I and type II error rates.
arXiv Detail & Related papers (2023-11-01T04:41:16Z)
Deep anytime-valid hypothesis testing [29.273915933729057]
We propose a general framework for constructing powerful, sequential hypothesis tests for nonparametric testing problems. We develop a principled approach of leveraging the representation capability of machine learning models within the testing-by-betting framework. Empirical results on synthetic and real-world datasets demonstrate that tests instantiated using our general framework are competitive against specialized baselines.
arXiv Detail & Related papers (2023-10-30T09:46:19Z)
On Pitfalls of Test-Time Adaptation [82.8392232222119]
Test-Time Adaptation (TTA) has emerged as a promising approach for tackling the robustness challenge under distribution shifts. We present TTAB, a test-time adaptation benchmark that encompasses ten state-of-the-art algorithms, a diverse array of distribution shifts, and two evaluation protocols.
arXiv Detail & Related papers (2023-06-06T09:35:29Z)
LTM: Scalable and Black-box Similarity-based Test Suite Minimization based on Language Models [0.6562256987706128]
Test suites tend to grow when software evolves, making it often infeasible to execute all test cases with the allocated testing budgets. Test suite minimization (TSM) is employed to improve the efficiency of software testing by removing redundant test cases. We propose LTM (Language model-based Test suite Minimization), a novel, scalable, and black-box similarity-based TSM approach.
arXiv Detail & Related papers (2023-04-03T22:16:52Z)
MaxMatch: Semi-Supervised Learning with Worst-Case Consistency [149.03760479533855]
We propose a worst-case consistency regularization technique for semi-supervised learning (SSL) We present a generalization bound for SSL consisting of the empirical loss terms observed on labeled and unlabeled training data separately. Motivated by this bound, we derive an SSL objective that minimizes the largest inconsistency between an original unlabeled sample and its multiple augmented variants.
arXiv Detail & Related papers (2022-09-26T12:04:49Z)
Hybrid Intelligent Testing in Simulation-Based Verification [0.0]
Several millions of tests may be required to achieve coverage goals. Coverage-Directed Test Selection learns from coverage feedback to bias testing towards the most effective tests. Novelty-Driven Verification learns to identify and simulate stimuli that differ from previous stimuli.
arXiv Detail & Related papers (2022-05-19T13:22:08Z)
Adversarial Attacks and Defense for Non-Parametric Two-Sample Tests [73.32304304788838]
This paper systematically uncovers the failure mode of non-parametric TSTs through adversarial attacks. To enable TST-agnostic attacks, we propose an ensemble attack framework that jointly minimizes the different types of test criteria. To robustify TSTs, we propose a max-min optimization that iteratively generates adversarial pairs to train the deep kernels.
arXiv Detail & Related papers (2022-02-07T11:18:04Z)
Cross-validation Confidence Intervals for Test Error [83.67415139421448]
This work develops central limit theorems for crossvalidation and consistent estimators of its variance under weak stability conditions on the learning algorithm. Results are the first of their kind for the popular choice of leave-one-out cross-validation.
arXiv Detail & Related papers (2020-07-24T17:40:06Z)
Bloom Origami Assays: Practical Group Testing [90.2899558237778]
Group testing is a well-studied problem with several appealing solutions. Recent biological studies impose practical constraints for COVID-19 that are incompatible with traditional methods. We develop a new method combining Bloom filters with belief propagation to scale to larger values of n (more than 100) with good empirical results.
arXiv Detail & Related papers (2020-07-21T19:31:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.