How to Evaluate Uncertainty Estimates in Machine Learning for
Regression?
- URL: http://arxiv.org/abs/2106.03395v2
- Date: Thu, 3 Aug 2023 12:53:40 GMT
- Title: How to Evaluate Uncertainty Estimates in Machine Learning for
Regression?
- Authors: Laurens Sluijterman, Eric Cator, Tom Heskes
- Abstract summary: We show that both approaches to evaluating the quality of uncertainty estimates have serious flaws.
Firstly, both approaches cannot disentangle the separate components that jointly create the predictive uncertainty.
Thirdly, the current approach to test prediction intervals directly has additional flaws.
- Score: 1.4610038284393165
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As neural networks become more popular, the need for accompanying uncertainty
estimates increases. There are currently two main approaches to test the
quality of these estimates. Most methods output a density. They can be compared
by evaluating their loglikelihood on a test set. Other methods output a
prediction interval directly. These methods are often tested by examining the
fraction of test points that fall inside the corresponding prediction
intervals. Intuitively both approaches seem logical. However, we demonstrate
through both theoretical arguments and simulations that both ways of evaluating
the quality of uncertainty estimates have serious flaws. Firstly, both
approaches cannot disentangle the separate components that jointly create the
predictive uncertainty, making it difficult to evaluate the quality of the
estimates of these components. Secondly, a better loglikelihood does not
guarantee better prediction intervals, which is what the methods are often used
for in practice. Moreover, the current approach to test prediction intervals
directly has additional flaws. We show why it is fundamentally flawed to test a
prediction or confidence interval on a single test set. At best, marginal
coverage is measured, implicitly averaging out overconfident and underconfident
predictions. A much more desirable property is pointwise coverage, requiring
the correct coverage for each prediction. We demonstrate through practical
examples that these effects can result in favoring a method, based on the
predictive uncertainty, that has undesirable behaviour of the confidence or
prediction intervals. Finally, we propose a simulation-based testing approach
that addresses these problems while still allowing easy comparison between
different methods.
Related papers
- Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting [55.17761802332469]
Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and test data by adapting a given model w.r.t. any test sample.
Prior methods perform backpropagation for each test sample, resulting in unbearable optimization costs to many applications.
We propose an Efficient Anti-Forgetting Test-Time Adaptation (EATA) method which develops an active sample selection criterion to identify reliable and non-redundant samples.
arXiv Detail & Related papers (2024-03-18T05:49:45Z) - Revisiting Confidence Estimation: Towards Reliable Failure Prediction [53.79160907725975]
We find a general, widely existing but actually-neglected phenomenon that most confidence estimation methods are harmful for detecting misclassification errors.
We propose to enlarge the confidence gap by finding flat minima, which yields state-of-the-art failure prediction performance.
arXiv Detail & Related papers (2024-03-05T11:44:14Z) - Efficient Normalized Conformal Prediction and Uncertainty Quantification
for Anti-Cancer Drug Sensitivity Prediction with Deep Regression Forests [0.0]
Conformal Prediction has emerged as a promising method to pair machine learning models with prediction intervals.
We propose a method to estimate the uncertainty of each sample by calculating the variance obtained from a Deep Regression Forest.
arXiv Detail & Related papers (2024-02-21T19:09:53Z) - Automatically Reconciling the Trade-off between Prediction Accuracy and
Earliness in Prescriptive Business Process Monitoring [0.802904964931021]
We focus on the problem of automatically reconciling the trade-off between prediction accuracy and prediction earliness.
Different approaches were presented in the literature to reconcile the trade-off between prediction accuracy and earliness.
We perform a comparative evaluation of the main alternative approaches for reconciling the trade-off between prediction accuracy and earliness.
arXiv Detail & Related papers (2023-07-12T06:07:53Z) - Rethinking Confidence Calibration for Failure Prediction [37.43981354073841]
Modern deep neural networks are often overconfident for their incorrect predictions.
We find that most confidence calibration methods are useless or harmful for failure prediction.
We propose a simple hypothesis: flat minima is beneficial for failure prediction.
arXiv Detail & Related papers (2023-03-06T08:54:18Z) - The Implicit Delta Method [61.36121543728134]
In this paper, we propose an alternative, the implicit delta method, which works by infinitesimally regularizing the training loss of uncertainty.
We show that the change in the evaluation due to regularization is consistent for the variance of the evaluation estimator, even when the infinitesimal change is approximated by a finite difference.
arXiv Detail & Related papers (2022-11-11T19:34:17Z) - Reliability-Aware Prediction via Uncertainty Learning for Person Image
Retrieval [51.83967175585896]
UAL aims at providing reliability-aware predictions by considering data uncertainty and model uncertainty simultaneously.
Data uncertainty captures the noise" inherent in the sample, while model uncertainty depicts the model's confidence in the sample's prediction.
arXiv Detail & Related papers (2022-10-24T17:53:20Z) - Dense Uncertainty Estimation [62.23555922631451]
In this paper, we investigate neural networks and uncertainty estimation techniques to achieve both accurate deterministic prediction and reliable uncertainty estimation.
We work on two types of uncertainty estimations solutions, namely ensemble based methods and generative model based methods, and explain their pros and cons while using them in fully/semi/weakly-supervised framework.
arXiv Detail & Related papers (2021-10-13T01:23:48Z) - Comparing Sequential Forecasters [35.38264087676121]
Consider two forecasters, each making a single prediction for a sequence of events over time.
How might we compare these forecasters, either online or post-hoc, while avoiding unverifiable assumptions on how the forecasts and outcomes were generated?
We present novel sequential inference procedures for estimating the time-varying difference in forecast scores.
We empirically validate our approaches by comparing real-world baseball and weather forecasters.
arXiv Detail & Related papers (2021-09-30T22:54:46Z) - Quantifying Uncertainty in Deep Spatiotemporal Forecasting [67.77102283276409]
We describe two types of forecasting problems: regular grid-based and graph-based.
We analyze UQ methods from both the Bayesian and the frequentist point view, casting in a unified framework via statistical decision theory.
Through extensive experiments on real-world road network traffic, epidemics, and air quality forecasting tasks, we reveal the statistical computational trade-offs for different UQ methods.
arXiv Detail & Related papers (2021-05-25T14:35:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.