Learning to predict test effectiveness
- URL: http://arxiv.org/abs/2208.09623v1
- Date: Sat, 20 Aug 2022 07:26:59 GMT
- Title: Learning to predict test effectiveness
- Authors: Morteza Zakeri-Nasrabadi and Saeed Parsa
- Abstract summary: This article offers a machine learning model to predict the extent to which the test could cover a class in terms of a new metric called Coverageability.
We offer a mathematical model to evaluate test effectiveness in terms of size and coverage of the test suite generated automatically for each class.
- Score: 1.4213973379473652
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The high cost of the test can be dramatically reduced, provided that the
coverability as an inherent feature of the code under test is predictable. This
article offers a machine learning model to predict the extent to which the test
could cover a class in terms of a new metric called Coverageability. The
prediction model consists of an ensemble of four regression models. The
learning samples consist of feature vectors, where features are source code
metrics computed for a class. The samples are labeled by the Coverageability
values computed for their corresponding classes. We offer a mathematical model
to evaluate test effectiveness in terms of size and coverage of the test suite
generated automatically for each class. We extend the size of the feature space
by introducing a new approach to defining sub-metrics in terms of existing
source code metrics. Using feature importance analysis on the learned
prediction models, we sort source code metrics in the order of their impact on
the test effectiveness. As a result of which, we found the class strict
cyclomatic complexity as the most influential source code metric. Our
experiments with the prediction models on a large corpus of Java projects
containing about 23,000 classes demonstrate the Mean Absolute Error (MAE) of
0.032, Mean Squared Error (MSE) of 0.004, and an R2-score of 0.855. Compared
with the state-of-the-art coverage prediction models, our models improve MAE,
MSE, and an R2-score by 5.78%, 2.84%, and 20.71%, respectively.
Related papers
- A Probabilistic Perspective on Unlearning and Alignment for Large Language Models [48.96686419141881]
We introduce the first formal probabilistic evaluation framework in Large Language Models (LLMs)
We derive novel metrics with high-probability guarantees concerning the output distribution of a model.
Our metrics are application-independent and allow practitioners to make more reliable estimates about model capabilities before deployment.
arXiv Detail & Related papers (2024-10-04T15:44:23Z) - Fantastic DNN Classifiers and How to Identify them without Data [0.685316573653194]
We show that the quality of a trained DNN classifier can be assessed without any example data.
We have developed two metrics: one using the features of the prototypes and the other using adversarial examples corresponding to each prototype.
Empirical evaluations show that accuracy obtained from test examples is directly proportional to quality measures obtained from the proposed metrics.
arXiv Detail & Related papers (2023-05-24T20:54:48Z) - An ensemble meta-estimator to predict source code testability [1.4213973379473652]
The size of a test suite determines the test effort and cost, while the coverage measure indicates the test effectiveness.
This paper offers a new equation to estimate testability regarding the size and coverage of a given test suite.
arXiv Detail & Related papers (2022-08-20T06:18:16Z) - Test Set Sizing Via Random Matrix Theory [91.3755431537592]
This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression.
It defines "ideal" as satisfying the integrity metric, i.e. the empirical model error is the actual measurement noise.
This paper is the first to solve for the training and test size for any model in a way that is truly optimal.
arXiv Detail & Related papers (2021-12-11T13:18:33Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - What is the Vocabulary of Flaky Tests? An Extended Replication [0.0]
We conduct an empirical study to assess the use of code identifiers to predict test flakiness.
We validated the performance of trained models using datasets with other flaky tests and from different projects.
arXiv Detail & Related papers (2021-03-23T16:42:22Z) - Active Testing: Sample-Efficient Model Evaluation [39.200332879659456]
We introduce active testing: a new framework for sample-efficient model evaluation.
Active testing addresses this by carefully selecting the test points to label.
We show how to remove that bias while reducing the variance of the estimator.
arXiv Detail & Related papers (2021-03-09T10:20:49Z) - Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle.
In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize.
Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.