Contextual Predictive Mutation Testing
- URL: http://arxiv.org/abs/2309.02389v1
- Date: Tue, 5 Sep 2023 17:00:15 GMT
- Title: Contextual Predictive Mutation Testing
- Authors: Kush Jain, Uri Alon, Alex Groce, Claire Le Goues
- Abstract summary: We introduce MutationBERT, an approach for predictive mutation testing that simultaneously encodes the source method mutation and test method.
Thanks to its higher precision, MutationBERT saves 33% of the time spent by a prior approach on checking/verifying live mutants.
We validate our input representation, and aggregation approaches for lifting predictions from the test matrix level to the test suite level, finding similar improvements in performance.
- Score: 17.832774161583036
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mutation testing is a powerful technique for assessing and improving test
suite quality that artificially introduces bugs and checks whether the test
suites catch them. However, it is also computationally expensive and thus does
not scale to large systems and projects. One promising recent approach to
tackling this scalability problem uses machine learning to predict whether the
tests will detect the synthetic bugs, without actually running those tests.
However, existing predictive mutation testing approaches still misclassify 33%
of detection outcomes on a randomly sampled set of mutant-test suite pairs. We
introduce MutationBERT, an approach for predictive mutation testing that
simultaneously encodes the source method mutation and test method, capturing
key context in the input representation. Thanks to its higher precision,
MutationBERT saves 33% of the time spent by a prior approach on
checking/verifying live mutants. MutationBERT, also outperforms the
state-of-the-art in both same project and cross project settings, with
meaningful improvements in precision, recall, and F1 score. We validate our
input representation, and aggregation approaches for lifting predictions from
the test matrix level to the test suite level, finding similar improvements in
performance. MutationBERT not only enhances the state-of-the-art in predictive
mutation testing, but also presents practical benefits for real-world
applications, both in saving developer time and finding hard to detect mutants.
Related papers
- LLMorpheus: Mutation Testing using Large Language Models [7.312170216336085]
This paper presents a technique where a Large Language Model (LLM) is prompted to suggest mutations by asking it what placeholders that have been inserted in source code could be replaced with.
We find LLMorpheus to be capable of producing mutants that resemble existing bugs that cannot be produced by StrykerJS, a state-of-the-art mutation testing tool.
arXiv Detail & Related papers (2024-04-15T17:25:14Z) - An Empirical Evaluation of Manually Created Equivalent Mutants [54.02049952279685]
Less than 10 % of manually created mutants are equivalent.
Surprisingly, our findings indicate that a significant portion of developers struggle to accurately identify equivalent mutants.
arXiv Detail & Related papers (2024-04-14T13:04:10Z) - Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting [55.17761802332469]
Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and test data by adapting a given model w.r.t. any test sample.
Prior methods perform backpropagation for each test sample, resulting in unbearable optimization costs to many applications.
We propose an Efficient Anti-Forgetting Test-Time Adaptation (EATA) method which develops an active sample selection criterion to identify reliable and non-redundant samples.
arXiv Detail & Related papers (2024-03-18T05:49:45Z) - Precise Error Rates for Computationally Efficient Testing [75.63895690909241]
We revisit the question of simple-versus-simple hypothesis testing with an eye towards computational complexity.
An existing test based on linear spectral statistics achieves the best possible tradeoff curve between type I and type II error rates.
arXiv Detail & Related papers (2023-11-01T04:41:16Z) - Effective Test Generation Using Pre-trained Large Language Models and
Mutation Testing [13.743062498008555]
We introduce MuTAP for improving the effectiveness of test cases generated by Large Language Models (LLMs) in terms of revealing bugs.
MuTAP is capable of generating effective test cases in the absence of natural language descriptions of the Program Under Test (PUTs)
Our results show that our proposed method is able to detect up to 28% more faulty human-written code snippets.
arXiv Detail & Related papers (2023-08-31T08:48:31Z) - Is this model reliable for everyone? Testing for strong calibration [4.893345190925178]
In a well-calibrated risk prediction model, the average predicted probability is close to the true event rate for any given subgroup.
The task of auditing a model for strong calibration is well-known to be difficult due to the sheer number of potential subgroups.
Recent developments in goodness-of-fit testing offer potential solutions but are not designed for settings with weak signal.
arXiv Detail & Related papers (2023-07-28T00:59:14Z) - Systematic Assessment of Fuzzers using Mutation Analysis [20.91546707828316]
In software testing, the gold standard for evaluating test quality is mutation analysis.
mutation analysis subsumes various coverage measures and provides a large and diverse set of faults.
We apply modern mutation analysis techniques that pool multiple mutations and allow us -- for the first time -- to evaluate and compare fuzzers with mutation analysis.
arXiv Detail & Related papers (2022-12-06T15:47:47Z) - T-Cal: An optimal test for the calibration of predictive models [49.11538724574202]
We consider detecting mis-calibration of predictive models using a finite validation dataset as a hypothesis testing problem.
detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently smooth functions of the predictions.
We propose T-Cal, a minimax test for calibration based on a de-biased plug-in estimator of the $ell$-Expected Error (ECE)
arXiv Detail & Related papers (2022-03-03T16:58:54Z) - MEMO: Test Time Robustness via Adaptation and Augmentation [131.28104376280197]
We study the problem of test time robustification, i.e., using the test input to improve model robustness.
Recent prior works have proposed methods for test time adaptation, however, they each introduce additional assumptions.
We propose a simple approach that can be used in any test setting where the model is probabilistic and adaptable.
arXiv Detail & Related papers (2021-10-18T17:55:11Z) - Better Aggregation in Test-Time Augmentation [4.259219671110274]
Test-time augmentation is the aggregation of predictions across transformed versions of a test input.
A key finding is that even when test-time augmentation produces a net improvement in accuracy, it can change many correct predictions into incorrect predictions.
We present a learning-based method for aggregating test-time augmentations.
arXiv Detail & Related papers (2020-11-23T00:46:00Z) - Noisy Adaptive Group Testing using Bayesian Sequential Experimental
Design [63.48989885374238]
When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually.
Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting.
arXiv Detail & Related papers (2020-04-26T23:41:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.