Negative impact of heavy-tailed uncertainty and error distributions on the reliability of calibration statistics for machine learning regression tasks
- URL: http://arxiv.org/abs/2402.10043v5
- Date: Mon, 19 Aug 2024 08:55:28 GMT
- Title: Negative impact of heavy-tailed uncertainty and error distributions on the reliability of calibration statistics for machine learning regression tasks
- Authors: Pascal Pernot,
- Abstract summary: It is shown that the estimation of MV, MSE and their confidence intervals becomes unreliable for heavy-tailed uncertainty and error distributions.
The same problem is expected to affect also conditional calibrations statistics, such as the popular ENCE.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Average calibration of the (variance-based) prediction uncertainties of machine learning regression tasks can be tested in two ways: one is to estimate the calibration error (CE) as the difference between the mean absolute error (MSE) and the mean variance (MV); the alternative is to compare the mean squared z-scores (ZMS) to 1. The problem is that both approaches might lead to different conclusions, as illustrated in this study for an ensemble of datasets from the recent machine learning uncertainty quantification (ML-UQ) literature. It is shown that the estimation of MV, MSE and their confidence intervals becomes unreliable for heavy-tailed uncertainty and error distributions, which seems to be a frequent feature of ML-UQ datasets. By contrast, the ZMS statistic is less sensitive and offers the most reliable approach in this context, still acknowledging that datasets with heavy-tailed z-scores distributions should be considered with great care. Unfortunately, the same problem is expected to affect also conditional calibrations statistics, such as the popular ENCE, and very likely post-hoc calibration methods based on similar statistics. Several solutions to circumvent the outlined problems are proposed.
Related papers
- Evaluation of uncertainty estimations for Gaussian process regression based machine learning interatomic potentials [0.0]
Uncertainty estimations for machine learning interatomic potentials are crucial to quantify the additional model error they introduce.
We consider GPR models with Coulomb and SOAP representations as inputs to predict potential energy surfaces and excitation energies of molecules.
We evaluate, how the GPR variance and ensemble-based uncertainties relate to the error and whether model performance improves by selecting the most uncertain samples from a fixed configuration space.
arXiv Detail & Related papers (2024-10-27T10:06:09Z) - On the good reliability of an interval-based metric to validate prediction uncertainty for machine learning regression tasks [0.0]
This study presents an opportunistic approach to a (more) reliable validation method for prediction uncertainty average calibration.
Considering that variance-based calibration metrics are quite sensitive to the presence of heavy tails in the uncertainty and error distributions, a shift is proposed to an interval-based metric, the Prediction Interval Coverage Probability (PICP)
The resulting PICPs are more quickly and reliably tested than variance-based calibration metrics.
arXiv Detail & Related papers (2024-08-23T14:16:10Z) - Validation of ML-UQ calibration statistics using simulated reference values: a sensitivity analysis [0.0]
Some popular Machine Learning Uncertainty Quantification (ML-UQ) calibration statistics do not have predefined reference values.
Simulated reference values, based on synthetic calibrated datasets derived from actual uncertainties, have been proposed to palliate this problem.
This study explores various facets of this problem, and shows that some statistics are excessively sensitive to the choice of generative distribution to be used for validation.
arXiv Detail & Related papers (2024-03-01T10:19:32Z) - Identifying Incorrect Classifications with Balanced Uncertainty [21.130311978327196]
Uncertainty estimation is critical for cost-sensitive deep-learning applications.
We propose the distributional imbalance to model the imbalance in uncertainty estimation as two kinds of distribution biases.
We then propose Balanced True Class Probability framework, which learns an uncertainty estimator with a novel Distributional Focal Loss objective.
arXiv Detail & Related papers (2021-10-15T11:52:31Z) - A Statistical Analysis of Summarization Evaluation Metrics using
Resampling Methods [60.04142561088524]
We find that the confidence intervals are rather wide, demonstrating high uncertainty in how reliable automatic metrics truly are.
Although many metrics fail to show statistical improvements over ROUGE, two recent works, QAEval and BERTScore, do in some evaluation settings.
arXiv Detail & Related papers (2021-03-31T18:28:14Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z) - Distribution-free uncertainty quantification for classification under
label shift [105.27463615756733]
We focus on uncertainty quantification (UQ) for classification problems via two avenues.
We first argue that label shift hurts UQ, by showing degradation in coverage and calibration.
We examine these techniques theoretically in a distribution-free framework and demonstrate their excellent practical performance.
arXiv Detail & Related papers (2021-03-04T20:51:03Z) - Uncertainty Quantification in Extreme Learning Machine: Analytical
Developments, Variance Estimates and Confidence Intervals [0.0]
Uncertainty quantification is crucial to assess prediction quality of a machine learning model.
Most methods proposed in the literature make strong assumptions on the data, ignore the randomness of input weights or neglect the bias contribution in confidence interval estimations.
This paper presents novel estimations that overcome these constraints and improve the understanding of ELM variability.
arXiv Detail & Related papers (2020-11-03T13:45:59Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - GenDICE: Generalized Offline Estimation of Stationary Values [108.17309783125398]
We show that effective estimation can still be achieved in important applications.
Our approach is based on estimating a ratio that corrects for the discrepancy between the stationary and empirical distributions.
The resulting algorithm, GenDICE, is straightforward and effective.
arXiv Detail & Related papers (2020-02-21T00:27:52Z) - Learning to Predict Error for MRI Reconstruction [67.76632988696943]
We demonstrate that predictive uncertainty estimated by the current methods does not highly correlate with prediction error.
We propose a novel method that estimates the target labels and magnitude of the prediction error in two steps.
arXiv Detail & Related papers (2020-02-13T15:55:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.