Related papers: Variance of ML-based software fault predictors: are we really improving fault prediction?

Variance of ML-based software fault predictors: are we really improving fault prediction?

URL: http://arxiv.org/abs/2310.17264v1
Date: Thu, 26 Oct 2023 09:31:32 GMT
Title: Variance of ML-based software fault predictors: are we really improving fault prediction?
Authors: Xhulja Shahini, Domenic Bubel, Andreas Metzger
Abstract summary: We experimentally analyze the variance of a state-of-the-art fault prediction approach. We observed a maximum variance of 10.10% in terms of the per-class accuracy metric.
Score: 0.3222802562733786
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Software quality assurance activities become increasingly difficult as software systems become more and more complex and continuously grow in size. Moreover, testing becomes even more expensive when dealing with large-scale systems. Thus, to effectively allocate quality assurance resources, researchers have proposed fault prediction (FP) which utilizes machine learning (ML) to predict fault-prone code areas. However, ML algorithms typically make use of stochastic elements to increase the prediction models' generalizability and efficiency of the training process. These stochastic elements, also known as nondeterminism-introducing (NI) factors, lead to variance in the training process and as a result, lead to variance in prediction accuracy and training time. This variance poses a challenge for reproducibility in research. More importantly, while fault prediction models may have shown good performance in the lab (e.g., often-times involving multiple runs and averaging outcomes), high variance of results can pose the risk that these models show low performance when applied in practice. In this work, we experimentally analyze the variance of a state-of-the-art fault prediction approach. Our experimental results indicate that NI factors can indeed cause considerable variance in the fault prediction models' accuracy. We observed a maximum variance of 10.10% in terms of the per-class accuracy metric. We thus, also discuss how to deal with such variance.

Related papers

Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors [61.92704516732144]
We show that the most robust features for correctness prediction are those that play a distinctive causal role in the model's behavior.<n>We propose two methods that leverage causal mechanisms to predict the correctness of model outputs.
arXiv Detail & Related papers (2025-05-17T00:31:39Z)
HopCast: Calibration of Autoregressive Dynamics Models [0.0]
This work introduces an alternative Predictor-Corrector approach named hop that uses Modern Hopfield Networks (MHN) to learn the errors of a deterministic Predictor.<n>The Corrector predicts a set of errors for the Predictor's output based on a context state at any timestep during autoregression.<n>The calibration and prediction performances are evaluated across a set of dynamical systems.
arXiv Detail & Related papers (2025-01-27T23:59:23Z)
Quantifying the Prediction Uncertainty of Machine Learning Models for Individual Data [2.1248439796866228]
This study investigates pNML's learnability for linear regression and neural networks. It demonstrates that pNML can improve the performance and robustness of these models on various tasks.
arXiv Detail & Related papers (2024-12-10T13:58:19Z)
Evaluation of uncertainty estimations for Gaussian process regression based machine learning interatomic potentials [0.0]
Uncertainty estimations for machine learning interatomic potentials are crucial to quantify the additional model error they introduce. We consider GPR models with Coulomb and SOAP representations as inputs to predict potential energy surfaces and excitation energies of molecules. We evaluate, how the GPR variance and ensemble-based uncertainties relate to the error and whether model performance improves by selecting the most uncertain samples from a fixed configuration space.
arXiv Detail & Related papers (2024-10-27T10:06:09Z)
Multiclass Alignment of Confidence and Certainty for Network Calibration [10.15706847741555]
Recent studies reveal that deep neural networks (DNNs) are prone to making overconfident predictions. We propose a new train-time calibration method, which features a simple, plug-and-play auxiliary loss known as multi-class alignment of predictive mean confidence and predictive certainty (MACC) Our method achieves state-of-the-art calibration performance for both in-domain and out-domain predictions.
arXiv Detail & Related papers (2023-09-06T00:56:24Z)
Learning Sample Difficulty from Pre-trained Models for Reliable Prediction [55.77136037458667]
We propose to utilize large-scale pre-trained models to guide downstream model training with sample difficulty-aware entropy regularization. We simultaneously improve accuracy and uncertainty calibration across challenging benchmarks.
arXiv Detail & Related papers (2023-04-20T07:29:23Z)
Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection [58.789823426981044]
We propose a novel auxiliary loss formulation that aims to align the class confidence of bounding boxes with the accurateness of predictions. Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios.
arXiv Detail & Related papers (2023-03-25T08:56:21Z)
Predictive Multiplicity in Probabilistic Classification [25.111463701666864]
We present a framework for measuring predictive multiplicity in probabilistic classification. We demonstrate the incidence and prevalence of predictive multiplicity in real-world tasks. Our results emphasize the need to report predictive multiplicity more widely.
arXiv Detail & Related papers (2022-06-02T16:25:29Z)
Hessian-based toolbox for reliable and interpretable machine learning in physics [58.720142291102135]
We present a toolbox for interpretability and reliability, extrapolation of the model architecture. It provides a notion of the influence of the input data on the prediction at a given test point, an estimation of the uncertainty of the model predictions, and an agnostic score for the model predictions. Our work opens the road to the systematic use of interpretability and reliability methods in ML applied to physics and, more generally, science.
arXiv Detail & Related papers (2021-08-04T16:32:59Z)
Uncertainty Prediction for Machine Learning Models of Material Properties [0.0]
Uncertainty in AI-based predictions of material properties is of immense importance for the success and reliability of AI applications in material science. We compare 3 different approaches to obtain such individual uncertainty, testing them on 12 ML-physical properties.
arXiv Detail & Related papers (2021-07-16T16:33:55Z)
Improving Uncertainty Calibration via Prior Augmented Data [56.88185136509654]
Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators. They are often overconfident in their predictions, which leads to inaccurate and miscalibrated probabilistic predictions. We propose a solution by seeking out regions of feature space where the model is unjustifiably overconfident, and conditionally raising the entropy of those predictions towards that of the prior distribution of the labels.
arXiv Detail & Related papers (2021-02-22T07:02:37Z)
A comprehensive study on the prediction reliability of graph neural networks for virtual screening [0.0]
We investigate the effects of model architectures, regularization methods, and loss functions on the prediction performance and reliability of classification results. Our result highlights that correct choice of regularization and inference methods is evidently important to achieve high success rate.
arXiv Detail & Related papers (2020-03-17T10:13:31Z)
Understanding and Mitigating the Tradeoff Between Robustness and Accuracy [88.51943635427709]
Adversarial training augments the training set with perturbations to improve the robust error. We show that the standard error could increase even when the augmented perturbations have noiseless observations from the optimal linear predictor.
arXiv Detail & Related papers (2020-02-25T08:03:01Z)
Learning to Predict Error for MRI Reconstruction [67.76632988696943]
We demonstrate that predictive uncertainty estimated by the current methods does not highly correlate with prediction error. We propose a novel method that estimates the target labels and magnitude of the prediction error in two steps.
arXiv Detail & Related papers (2020-02-13T15:55:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.