An ensemble meta-estimator to predict source code testability
- URL: http://arxiv.org/abs/2208.09614v2
- Date: Wed, 24 Aug 2022 10:26:34 GMT
- Title: An ensemble meta-estimator to predict source code testability
- Authors: Morteza Zakeri-Nasrabadi and Saeed Parsa
- Abstract summary: The size of a test suite determines the test effort and cost, while the coverage measure indicates the test effectiveness.
This paper offers a new equation to estimate testability regarding the size and coverage of a given test suite.
- Score: 1.4213973379473652
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unlike most other software quality attributes, testability cannot be
evaluated solely based on the characteristics of the source code. The
effectiveness of the test suite and the budget assigned to the test highly
impact the testability of the code under test. The size of a test suite
determines the test effort and cost, while the coverage measure indicates the
test effectiveness. Therefore, testability can be measured based on the
coverage and number of test cases provided by a test suite, considering the
test budget. This paper offers a new equation to estimate testability regarding
the size and coverage of a given test suite. The equation has been used to
label 23,000 classes belonging to 110 Java projects with their testability
measure. The labeled classes were vectorized using 262 metrics. The labeled
vectors were fed into a family of supervised machine learning algorithms,
regression, to predict testability in terms of the source code metrics.
Regression models predicted testability with an R2 of 0.68 and a mean squared
error of 0.03, suitable in practice. Fifteen software metrics highly affecting
testability prediction were identified using a feature importance analysis
technique on the learned model. The proposed models have improved mean absolute
error by 38% due to utilizing new criteria, metrics, and data compared with the
relevant study on predicting branch coverage as a test criterion. As an
application of testability prediction, it is demonstrated that automated
refactoring of 42 smelly Java classes targeted at improving the 15 influential
software metrics could elevate their testability by an average of 86.87%.
Related papers
- TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark [24.14654309612826]
TestGenEval comprises 68,647 tests from 1,210 code and test file pairs across 11 well-maintained Python repositories.
It covers initial tests authoring, test suite completion, and code coverage improvements.
We evaluate several popular models, with sizes ranging from 7B to 405B parameters.
arXiv Detail & Related papers (2024-10-01T14:47:05Z) - Precise Error Rates for Computationally Efficient Testing [75.63895690909241]
We revisit the question of simple-versus-simple hypothesis testing with an eye towards computational complexity.
An existing test based on linear spectral statistics achieves the best possible tradeoff curve between type I and type II error rates.
arXiv Detail & Related papers (2023-11-01T04:41:16Z) - Effective Test Generation Using Pre-trained Large Language Models and
Mutation Testing [13.743062498008555]
We introduce MuTAP for improving the effectiveness of test cases generated by Large Language Models (LLMs) in terms of revealing bugs.
MuTAP is capable of generating effective test cases in the absence of natural language descriptions of the Program Under Test (PUTs)
Our results show that our proposed method is able to detect up to 28% more faulty human-written code snippets.
arXiv Detail & Related papers (2023-08-31T08:48:31Z) - From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing.
This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time.
We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - Learning to predict test effectiveness [1.4213973379473652]
This article offers a machine learning model to predict the extent to which the test could cover a class in terms of a new metric called Coverageability.
We offer a mathematical model to evaluate test effectiveness in terms of size and coverage of the test suite generated automatically for each class.
arXiv Detail & Related papers (2022-08-20T07:26:59Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - On the use of test smells for prediction of flaky tests [0.0]
flaky tests hamper the evaluation of test results and can increase costs.
Existing approaches based on the use of the test case vocabulary may be context-sensitive and prone to overfitting.
We investigate the use of test smells as predictors of flaky tests.
arXiv Detail & Related papers (2021-08-26T13:21:55Z) - TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning
Tasks [14.547623982073475]
Deep learning systems are notoriously difficult to test and debug.
It is essential to conduct test selection and label only those selected "high quality" bug-revealing test inputs for test cost reduction.
We propose a novel test prioritization technique that brings order into the unlabeled test instances according to their bug-revealing capabilities, namely TestRank.
arXiv Detail & Related papers (2021-05-21T03:41:10Z) - Double Perturbation: On the Robustness of Robustness and Counterfactual
Bias Evaluation [109.06060143938052]
We propose a "double perturbation" framework to uncover model weaknesses beyond the test dataset.
We apply this framework to study two perturbation-based approaches that are used to analyze models' robustness and counterfactual bias in English.
arXiv Detail & Related papers (2021-04-12T06:57:36Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Cross-validation Confidence Intervals for Test Error [83.67415139421448]
This work develops central limit theorems for crossvalidation and consistent estimators of its variance under weak stability conditions on the learning algorithm.
Results are the first of their kind for the popular choice of leave-one-out cross-validation.
arXiv Detail & Related papers (2020-07-24T17:40:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.