On the good reliability of an interval-based metric to validate prediction uncertainty for machine learning regression tasks
- URL: http://arxiv.org/abs/2408.13089v2
- Date: Mon, 26 Aug 2024 06:40:07 GMT
- Title: On the good reliability of an interval-based metric to validate prediction uncertainty for machine learning regression tasks
- Authors: Pascal Pernot,
- Abstract summary: This study presents an opportunistic approach to a (more) reliable validation method for prediction uncertainty average calibration.
Considering that variance-based calibration metrics are quite sensitive to the presence of heavy tails in the uncertainty and error distributions, a shift is proposed to an interval-based metric, the Prediction Interval Coverage Probability (PICP)
The resulting PICPs are more quickly and reliably tested than variance-based calibration metrics.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This short study presents an opportunistic approach to a (more) reliable validation method for prediction uncertainty average calibration. Considering that variance-based calibration metrics (ZMS, NLL, RCE...) are quite sensitive to the presence of heavy tails in the uncertainty and error distributions, a shift is proposed to an interval-based metric, the Prediction Interval Coverage Probability (PICP). It is shown on a large ensemble of molecular properties datasets that (1) sets of z-scores are well represented by Student's-$t(\nu)$ distributions, $\nu$ being the number of degrees of freedom; (2) accurate estimation of 95 $\%$ prediction intervals can be obtained by the simple $2\sigma$ rule for $\nu>3$; and (3) the resulting PICPs are more quickly and reliably tested than variance-based calibration metrics. Overall, this method enables to test 20 $\%$ more datasets than ZMS testing. Conditional calibration is also assessed using the PICP approach.
Related papers
- Provably Reliable Conformal Prediction Sets in the Presence of Data Poisoning [53.42244686183879]
Conformal prediction provides model-agnostic and distribution-free uncertainty quantification.
Yet, conformal prediction is not reliable under poisoning attacks where adversaries manipulate both training and calibration data.
We propose reliable prediction sets (RPS): the first efficient method for constructing conformal prediction sets with provable reliability guarantees under poisoning.
arXiv Detail & Related papers (2024-10-13T15:37:11Z) - A Confidence Interval for the $\ell_2$ Expected Calibration Error [35.88784957918326]
We develop confidence intervals $ell$ Expected the Error (ECE)
We consider top-1-to-$k$ calibration, which includes both the popular notion of confidence calibration as well as calibration.
For a debiased estimator of the ECE, we show normality, but with different convergence rates and variances for calibrated and misd models.
arXiv Detail & Related papers (2024-08-16T20:00:08Z) - Measuring Stochastic Data Complexity with Boltzmann Influence Functions [12.501336941823627]
Estimating uncertainty of a model's prediction on a test point is a crucial part of ensuring reliability and calibration under distribution shifts.
We propose IF-COMP, a scalable and efficient approximation of the pNML distribution that linearizes the model with a temperature-scaled Boltzmann influence function.
We experimentally validate IF-COMP on uncertainty calibration, mislabel detection, and OOD detection tasks, where it consistently matches or beats strong baseline methods.
arXiv Detail & Related papers (2024-06-04T20:01:39Z) - Beyond Calibration: Assessing the Probabilistic Fit of Neural Regressors via Conditional Congruence [2.2359781747539396]
Deep networks often suffer from overconfidence and misaligned predictive distributions.
We introduce a metric, Conditional Congruence Error (CCE), that uses conditional kernel mean embeddings to estimate the distance between the learned predictive distribution and the empirical, conditional distribution in a dataset.
We show that using to measure congruence 1) accurately quantifies misalignment between distributions when the data generating process is known, 2) effectively scales to real-world, high dimensional image regression tasks, and 3) can be used to gauge model reliability on unseen instances.
arXiv Detail & Related papers (2024-05-20T23:30:07Z) - Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting [55.17761802332469]
Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and test data by adapting a given model w.r.t. any test sample.
Prior methods perform backpropagation for each test sample, resulting in unbearable optimization costs to many applications.
We propose an Efficient Anti-Forgetting Test-Time Adaptation (EATA) method which develops an active sample selection criterion to identify reliable and non-redundant samples.
arXiv Detail & Related papers (2024-03-18T05:49:45Z) - Negative impact of heavy-tailed uncertainty and error distributions on the reliability of calibration statistics for machine learning regression tasks [0.0]
It is shown that the estimation of MV, MSE and their confidence intervals becomes unreliable for heavy-tailed uncertainty and error distributions.
The same problem is expected to affect also conditional calibrations statistics, such as the popular ENCE.
arXiv Detail & Related papers (2024-02-15T16:05:35Z) - SMURF-THP: Score Matching-based UnceRtainty quantiFication for
Transformer Hawkes Process [76.98721879039559]
We propose SMURF-THP, a score-based method for learning Transformer Hawkes process and quantifying prediction uncertainty.
Specifically, SMURF-THP learns the score function of events' arrival time based on a score-matching objective.
We conduct extensive experiments in both event type prediction and uncertainty quantification of arrival time.
arXiv Detail & Related papers (2023-10-25T03:33:45Z) - Conservative Prediction via Data-Driven Confidence Minimization [70.93946578046003]
In safety-critical applications of machine learning, it is often desirable for a model to be conservative.
We propose the Data-Driven Confidence Minimization framework, which minimizes confidence on an uncertainty dataset.
arXiv Detail & Related papers (2023-06-08T07:05:36Z) - Sample-dependent Adaptive Temperature Scaling for Improved Calibration [95.7477042886242]
Post-hoc approach to compensate for neural networks being wrong is to perform temperature scaling.
We propose to predict a different temperature value for each input, allowing us to adjust the mismatch between confidence and accuracy.
We test our method on the ResNet50 and WideResNet28-10 architectures using the CIFAR10/100 and Tiny-ImageNet datasets.
arXiv Detail & Related papers (2022-07-13T14:13:49Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.