Metrics of calibration for probabilistic predictions
- URL: http://arxiv.org/abs/2205.09680v1
- Date: Thu, 19 May 2022 16:38:24 GMT
- Title: Metrics of calibration for probabilistic predictions
- Authors: Imanol Arrieta-Ibarra, Paman Gujral, Jonathan Tannen, Mark Tygert, and
Cherie Xu
- Abstract summary: "Reliability diagrams" help detect and diagnose statistically significant discrepancies -- so-called "miscalibration"
The canonical reliability diagrams histogram the observed and expected values of the predictions.
But, which widths of bins or kernels are best?
Slope is easy to perceive with quantitative precision, even when the constant offsets of the secant lines are irrelevant.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Predictions are often probabilities; e.g., a prediction could be for
precipitation tomorrow, but with only a 30% chance. Given such probabilistic
predictions together with the actual outcomes, "reliability diagrams" help
detect and diagnose statistically significant discrepancies -- so-called
"miscalibration" -- between the predictions and the outcomes. The canonical
reliability diagrams histogram the observed and expected values of the
predictions; replacing the hard histogram binning with soft kernel density
estimation is another common practice. But, which widths of bins or kernels are
best? Plots of the cumulative differences between the observed and expected
values largely avoid this question, by displaying miscalibration directly as
the slopes of secant lines for the graphs. Slope is easy to perceive with
quantitative precision, even when the constant offsets of the secant lines are
irrelevant; there is no need to bin or perform kernel density estimation.
The existing standard metrics of miscalibration each summarize a reliability
diagram as a single scalar statistic. The cumulative plots naturally lead to
scalar metrics for the deviation of the graph of cumulative differences away
from zero; good calibration corresponds to a horizontal, flat graph which
deviates little from zero. The cumulative approach is currently unconventional,
yet offers many favorable statistical properties, guaranteed via mathematical
theory backed by rigorous proofs and illustrative numerical examples. In
particular, metrics based on binning or kernel density estimation unavoidably
must trade-off statistical confidence for the ability to resolve variations as
a function of the predicted probability or vice versa. Widening the bins or
kernels averages away random noise while giving up some resolving power.
Narrowing the bins or kernels enhances resolving power while not averaging away
as much noise.
Related papers
- Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise [51.87307904567702]
Quantile regression is a leading approach for obtaining such intervals via the empirical estimation of quantiles in the distribution of outputs.
We propose Relaxed Quantile Regression (RQR), a direct alternative to quantile regression based interval construction that removes this arbitrary constraint.
We demonstrate that this added flexibility results in intervals with an improvement in desirable qualities.
arXiv Detail & Related papers (2024-06-05T13:36:38Z) - TIC-TAC: A Framework for Improved Covariance Estimation in Deep Heteroscedastic Regression [109.69084997173196]
Deepscedastic regression involves jointly optimizing the mean and covariance of the predicted distribution using the negative log-likelihood.
Recent works show that this may result in sub-optimal convergence due to the challenges associated with covariance estimation.
We study two questions: (1) Does the predicted covariance truly capture the randomness of the predicted mean?
Our results show that not only does TIC accurately learn the covariance, it additionally facilitates an improved convergence of the negative log-likelihood.
arXiv Detail & Related papers (2023-10-29T09:54:03Z) - Statistical Estimation Under Distribution Shift: Wasserstein
Perturbations and Minimax Theory [24.540342159350015]
We focus on Wasserstein distribution shifts, where every data point may undergo a slight perturbation.
We consider perturbations that are either independent or coordinated joint shifts across data points.
We analyze several important statistical problems, including location estimation, linear regression, and non-parametric density estimation.
arXiv Detail & Related papers (2023-08-03T16:19:40Z) - Confidence and Dispersity Speak: Characterising Prediction Matrix for
Unsupervised Accuracy Estimation [51.809741427975105]
This work aims to assess how well a model performs under distribution shifts without using labels.
We use the nuclear norm that has been shown to be effective in characterizing both properties.
We show that the nuclear norm is more accurate and robust in accuracy than existing methods.
arXiv Detail & Related papers (2023-02-02T13:30:48Z) - Evaluating Probabilistic Classifiers: The Triptych [62.997667081978825]
We propose and study a triptych of diagnostic graphics that focus on distinct and complementary aspects of forecast performance.
The reliability diagram addresses calibration, the receiver operating characteristic (ROC) curve diagnoses discrimination ability, and the Murphy diagram visualizes overall predictive performance and value.
arXiv Detail & Related papers (2023-01-25T19:35:23Z) - A Consistent and Differentiable Lp Canonical Calibration Error Estimator [21.67616079217758]
Deep neural networks are poorly calibrated and tend to output overconfident predictions.
We propose a low-bias, trainable calibration error estimator based on Dirichlet kernel density estimates.
Our method has a natural choice of kernel, and can be used to generate consistent estimates of other quantities.
arXiv Detail & Related papers (2022-10-13T15:11:11Z) - Statistical Efficiency of Score Matching: The View from Isoperimetry [96.65637602827942]
We show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated.
We formalize these results both in the sample regime and in the finite regime.
arXiv Detail & Related papers (2022-10-03T06:09:01Z) - Evaluating probabilistic classifiers: Reliability diagrams and score
decompositions revisited [68.8204255655161]
We introduce the CORP approach, which generates provably statistically Consistent, Optimally binned, and Reproducible reliability diagrams in an automated way.
Corpor is based on non-parametric isotonic regression and implemented via the Pool-adjacent-violators (PAV) algorithm.
arXiv Detail & Related papers (2020-08-07T08:22:26Z) - Matrix Completion with Quantified Uncertainty through Low Rank Gaussian
Copula [30.84155327760468]
This paper proposes a framework for missing value imputation with quantified uncertainty.
The time required to fit the model scales linearly with the number of rows and the number of columns in the dataset.
Empirical results show the method yields state-of-the-art imputation accuracy across a wide range of data types.
arXiv Detail & Related papers (2020-06-18T19:51:42Z) - Plots of the cumulative differences between observed and expected values
of ordered Bernoulli variates [0.0]
"Reliability diagrams" (also known as "calibration plots") help detect and diagnose significant discrepancies between predictions and outcomes.
The canonical reliability diagrams are based on histogramming the observed and expected values of the predictions.
Several variants of the standard reliability diagrams propose to replace the hard histogram binning with soft kernel density estimation.
arXiv Detail & Related papers (2020-06-03T20:15:43Z) - Estimation of Accurate and Calibrated Uncertainties in Deterministic
models [0.8702432681310401]
We devise a method to transform a deterministic prediction into a probabilistic one.
We show that for doing so, one has to compromise between the accuracy and the reliability (calibration) of such a model.
We show several examples both with synthetic data, where the underlying hidden noise can accurately be recovered, and with large real-world datasets.
arXiv Detail & Related papers (2020-03-11T04:02:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.