Negative impact of heavy-tailed uncertainty and error distributions on the reliability of calibration statistics for machine learning regression tasks
- URL: http://arxiv.org/abs/2402.10043v4
- Date: Wed, 5 Jun 2024 14:25:23 GMT
- Title: Negative impact of heavy-tailed uncertainty and error distributions on the reliability of calibration statistics for machine learning regression tasks
- Authors: Pascal Pernot,
- Abstract summary: It is shown that the estimation of MV, MSE and their confidence intervals becomes unreliable for heavy-tailed uncertainty and error distributions.
The same problem is expected to affect also conditional calibrations statistics, such as the popular ENCE.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Average calibration of the (variance-based) prediction uncertainties of machine learning regression tasks can be tested in two ways: one is to estimate the calibration error (CE) as the difference between the mean absolute error (MSE) and the mean variance (MV); the alternative is to compare the mean squared z-scores (ZMS) to 1. The problem is that both approaches might lead to different conclusions, as illustrated in this study for an ensemble of datasets from the recent machine learning uncertainty quantification (ML-UQ) literature. It is shown that the estimation of MV, MSE and their confidence intervals becomes unreliable for heavy-tailed uncertainty and error distributions, which seems to be a frequent feature of ML-UQ datasets. By contrast, the ZMS statistic is less sensitive and offers the most reliable approach in this context. Unfortunately, the same problem is expected to affect also conditional calibrations statistics, such as the popular ENCE, and very likely post-hoc calibration methods based on similar statistics. Several solutions to circumvent the outlined problems are proposed.
Related papers
- Validation of ML-UQ calibration statistics using simulated reference values: a sensitivity analysis [0.0]
Some popular Machine Learning Uncertainty Quantification (ML-UQ) calibration statistics do not have predefined reference values.
Simulated reference values, based on synthetic calibrated datasets derived from actual uncertainties, have been proposed to palliate this problem.
This study explores various facets of this problem, and shows that some statistics are excessively sensitive to the choice of generative distribution to be used for validation.
arXiv Detail & Related papers (2024-03-01T10:19:32Z) - Towards Understanding Variants of Invariant Risk Minimization through the Lens of Calibration [0.6906005491572401]
We show that Information Bottleneck-based IRM achieves consistent calibration across different environments.
Our empirical evidence indicates that models exhibiting consistent calibration across environments are also well-calibrated.
arXiv Detail & Related papers (2024-01-31T02:08:43Z) - Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - Properties of the ENCE and other MAD-based calibration metrics [0.0]
The Expected Normalized Error (ENCE) is a popular calibration statistic used in Machine Learning.
A similar behavior affects the calibration error based on the variance of z-scores (ZVE), and in both cases this property is a consequence of the use of a Mean Absolute Deviation (MAD) statistic to estimate calibration errors.
A solution is proposed to infer ENCE and ZVE values that do not depend on the number of bins for assumed datasets to be calibrated.
arXiv Detail & Related papers (2023-05-17T08:51:42Z) - Identifying Incorrect Classifications with Balanced Uncertainty [21.130311978327196]
Uncertainty estimation is critical for cost-sensitive deep-learning applications.
We propose the distributional imbalance to model the imbalance in uncertainty estimation as two kinds of distribution biases.
We then propose Balanced True Class Probability framework, which learns an uncertainty estimator with a novel Distributional Focal Loss objective.
arXiv Detail & Related papers (2021-10-15T11:52:31Z) - Expected Validation Performance and Estimation of a Random Variable's
Maximum [48.83713377993604]
We analyze three statistical estimators for expected validation performance.
We find the unbiased estimator has the highest variance, and the estimator with the smallest variance has the largest bias.
We find that the two biased estimators lead to the fewest incorrect conclusions.
arXiv Detail & Related papers (2021-10-01T18:48:47Z) - A Statistical Analysis of Summarization Evaluation Metrics using
Resampling Methods [60.04142561088524]
We find that the confidence intervals are rather wide, demonstrating high uncertainty in how reliable automatic metrics truly are.
Although many metrics fail to show statistical improvements over ROUGE, two recent works, QAEval and BERTScore, do in some evaluation settings.
arXiv Detail & Related papers (2021-03-31T18:28:14Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z) - Uncertainty Quantification in Extreme Learning Machine: Analytical
Developments, Variance Estimates and Confidence Intervals [0.0]
Uncertainty quantification is crucial to assess prediction quality of a machine learning model.
Most methods proposed in the literature make strong assumptions on the data, ignore the randomness of input weights or neglect the bias contribution in confidence interval estimations.
This paper presents novel estimations that overcome these constraints and improve the understanding of ELM variability.
arXiv Detail & Related papers (2020-11-03T13:45:59Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z) - Learning to Predict Error for MRI Reconstruction [67.76632988696943]
We demonstrate that predictive uncertainty estimated by the current methods does not highly correlate with prediction error.
We propose a novel method that estimates the target labels and magnitude of the prediction error in two steps.
arXiv Detail & Related papers (2020-02-13T15:55:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.