FlaKat: A Machine Learning-Based Categorization Framework for Flaky
Tests
- URL: http://arxiv.org/abs/2403.01003v1
- Date: Fri, 1 Mar 2024 22:00:44 GMT
- Title: FlaKat: A Machine Learning-Based Categorization Framework for Flaky
Tests
- Authors: Shizhe Lin, Ryan Zheng He Liu, Ladan Tahvildari
- Abstract summary: Flaky tests can pass or fail non-deterministically, without alterations to a software system.
State-of-the-art research incorporates machine learning solutions into flaky test detection and achieves reasonably good accuracy.
- Score: 3.0846824529023382
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Flaky tests can pass or fail non-deterministically, without alterations to a
software system. Such tests are frequently encountered by developers and hinder
the credibility of test suites. State-of-the-art research incorporates machine
learning solutions into flaky test detection and achieves reasonably good
accuracy. Moreover, the majority of automated flaky test repair solutions are
designed for specific types of flaky tests. This research work proposes a novel
categorization framework, called FlaKat, which uses machine-learning
classifiers for fast and accurate prediction of the category of a given flaky
test that reflects its root cause. Sampling techniques are applied to address
the imbalance between flaky test categories in the International Dataset of
Flaky Test (IDoFT). A new evaluation metric, called Flakiness Detection
Capacity (FDC), is proposed for measuring the accuracy of classifiers from the
perspective of information theory and provides proof for its effectiveness. The
final FDC results are also in agreement with F1 score regarding which
classifier yields the best flakiness classification.
Related papers
- An Analysis of LLM Fine-Tuning and Few-Shot Learning for Flaky Test Detection and Classification [1.9336815376402723]
Flaky tests exhibit non-deterministic behavior during execution.
Flaky test detection and classification is challenging due to the variability in test behavior.
arXiv Detail & Related papers (2025-02-04T20:54:51Z) - Automatically Learning a Precise Measurement for Fault Diagnosis Capability of Test Cases [21.276670659232284]
We propose a novel result-agnostic metric RLFDC which predicts FDC values of tests through reinforcement learning.
In particular, we treat FL results as reward signals, and train an FDC prediction model with the direct FL feedback to automatically learn a more accurate measurement.
arXiv Detail & Related papers (2025-01-04T07:16:49Z) - A Hybrid Framework for Statistical Feature Selection and Image-Based Noise-Defect Detection [55.2480439325792]
This paper presents a hybrid framework that integrates both statistical feature selection and classification techniques to improve defect detection accuracy.
We present around 55 distinguished features that are extracted from industrial images, which are then analyzed using statistical methods.
By integrating these methods with flexible machine learning applications, the proposed framework improves detection accuracy and reduces false positives and misclassifications.
arXiv Detail & Related papers (2024-12-11T22:12:21Z) - FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categories and Test Code Repair [0.5749787074942512]
Flaky tests are problematic because they non-deterministically pass or fail for the same software version under test.
In this paper, we focus on predicting the type of fix that is required to remove flakiness and then repair the test code on that basis.
One key idea is to guide the repair process with additional knowledge about the test's flakiness in the form of its predicted fix category.
arXiv Detail & Related papers (2023-06-21T19:34:16Z) - A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts [117.72709110877939]
Test-time adaptation (TTA) has the potential to adapt a pre-trained model to unlabeled data during testing, before making predictions.
We categorize TTA into several distinct groups based on the form of test data, namely, test-time domain adaptation, test-time batch adaptation, and online test-time adaptation.
arXiv Detail & Related papers (2023-03-27T16:32:21Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - On the use of test smells for prediction of flaky tests [0.0]
flaky tests hamper the evaluation of test results and can increase costs.
Existing approaches based on the use of the test case vocabulary may be context-sensitive and prone to overfitting.
We investigate the use of test smells as predictors of flaky tests.
arXiv Detail & Related papers (2021-08-26T13:21:55Z) - What is the Vocabulary of Flaky Tests? An Extended Replication [0.0]
We conduct an empirical study to assess the use of code identifiers to predict test flakiness.
We validated the performance of trained models using datasets with other flaky tests and from different projects.
arXiv Detail & Related papers (2021-03-23T16:42:22Z) - Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle.
In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize.
Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks.
We present a unifying view of randomized smoothing over arbitrary functions.
We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.