On the use of test smells for prediction of flaky tests
- URL: http://arxiv.org/abs/2108.11781v1
- Date: Thu, 26 Aug 2021 13:21:55 GMT
- Title: On the use of test smells for prediction of flaky tests
- Authors: B. H. P. Camara, M. A. G. Silva, A. T. Endo, S. R. Vergilio
- Abstract summary: flaky tests hamper the evaluation of test results and can increase costs.
Existing approaches based on the use of the test case vocabulary may be context-sensitive and prone to overfitting.
We investigate the use of test smells as predictors of flaky tests.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Regression testing is an important phase to deliver software with quality.
However, flaky tests hamper the evaluation of test results and can increase
costs. This is because a flaky test may pass or fail non-deterministically and
to identify properly the flakiness of a test requires rerunning the test suite
multiple times. To cope with this challenge, approaches have been proposed
based on prediction models and machine learning. Existing approaches based on
the use of the test case vocabulary may be context-sensitive and prone to
overfitting, presenting low performance when executed in a cross-project
scenario. To overcome these limitations, we investigate the use of test smells
as predictors of flaky tests. We conducted an empirical study to understand if
test smells have good performance as a classifier to predict the flakiness in
the cross-project context, and analyzed the information gain of each test
smell. We also compared the test smell-based approach with the vocabulary-based
one. As a result, we obtained a classifier that had a reasonable performance
(Random Forest, 0.83%) to predict the flakiness in the testing phase. This
classifier presented better performance than vocabulary-based model for
cross-project prediction. The Assertion Roulette and Sleepy Test test smell
types are the ones associated with the best information gain values.
Related papers
- Do Test and Environmental Complexity Increase Flakiness? An Empirical Study of SAP HANA [47.29324864511411]
Flaky tests fail seemingly at random without changes to the code.
We study characteristics of tests and the test environment that potentially impact test flakiness.
arXiv Detail & Related papers (2024-09-16T07:52:09Z) - Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting [55.17761802332469]
Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and test data by adapting a given model w.r.t. any test sample.
Prior methods perform backpropagation for each test sample, resulting in unbearable optimization costs to many applications.
We propose an Efficient Anti-Forgetting Test-Time Adaptation (EATA) method which develops an active sample selection criterion to identify reliable and non-redundant samples.
arXiv Detail & Related papers (2024-03-18T05:49:45Z) - FlaKat: A Machine Learning-Based Categorization Framework for Flaky
Tests [3.0846824529023382]
Flaky tests can pass or fail non-deterministically, without alterations to a software system.
State-of-the-art research incorporates machine learning solutions into flaky test detection and achieves reasonably good accuracy.
arXiv Detail & Related papers (2024-03-01T22:00:44Z) - Precise Error Rates for Computationally Efficient Testing [75.63895690909241]
We revisit the question of simple-versus-simple hypothesis testing with an eye towards computational complexity.
An existing test based on linear spectral statistics achieves the best possible tradeoff curve between type I and type II error rates.
arXiv Detail & Related papers (2023-11-01T04:41:16Z) - An ensemble meta-estimator to predict source code testability [1.4213973379473652]
The size of a test suite determines the test effort and cost, while the coverage measure indicates the test effectiveness.
This paper offers a new equation to estimate testability regarding the size and coverage of a given test suite.
arXiv Detail & Related papers (2022-08-20T06:18:16Z) - Boost Test-Time Performance with Closed-Loop Inference [85.43516360332646]
We propose to predict hard-classified test samples in a looped manner to boost the model performance.
We first devise a filtering criterion to identify those hard-classified test samples that need additional inference loops.
For each hard sample, we construct an additional auxiliary learning task based on its original top-$K$ predictions to calibrate the model.
arXiv Detail & Related papers (2022-03-21T10:20:21Z) - TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning
Tasks [14.547623982073475]
Deep learning systems are notoriously difficult to test and debug.
It is essential to conduct test selection and label only those selected "high quality" bug-revealing test inputs for test cost reduction.
We propose a novel test prioritization technique that brings order into the unlabeled test instances according to their bug-revealing capabilities, namely TestRank.
arXiv Detail & Related papers (2021-05-21T03:41:10Z) - Double Perturbation: On the Robustness of Robustness and Counterfactual
Bias Evaluation [109.06060143938052]
We propose a "double perturbation" framework to uncover model weaknesses beyond the test dataset.
We apply this framework to study two perturbation-based approaches that are used to analyze models' robustness and counterfactual bias in English.
arXiv Detail & Related papers (2021-04-12T06:57:36Z) - What is the Vocabulary of Flaky Tests? An Extended Replication [0.0]
We conduct an empirical study to assess the use of code identifiers to predict test flakiness.
We validated the performance of trained models using datasets with other flaky tests and from different projects.
arXiv Detail & Related papers (2021-03-23T16:42:22Z) - Noisy Adaptive Group Testing using Bayesian Sequential Experimental
Design [63.48989885374238]
When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually.
Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting.
arXiv Detail & Related papers (2020-04-26T23:41:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.