Properties of the ENCE and other MAD-based calibration metrics
- URL: http://arxiv.org/abs/2305.11905v1
- Date: Wed, 17 May 2023 08:51:42 GMT
- Title: Properties of the ENCE and other MAD-based calibration metrics
- Authors: Pascal Pernot
- Abstract summary: The Expected Normalized Error (ENCE) is a popular calibration statistic used in Machine Learning.
A similar behavior affects the calibration error based on the variance of z-scores (ZVE), and in both cases this property is a consequence of the use of a Mean Absolute Deviation (MAD) statistic to estimate calibration errors.
A solution is proposed to infer ENCE and ZVE values that do not depend on the number of bins for assumed datasets to be calibrated.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Expected Normalized Calibration Error (ENCE) is a popular calibration
statistic used in Machine Learning to assess the quality of prediction
uncertainties for regression problems. Estimation of the ENCE is based on the
binning of calibration data. In this short note, I illustrate an annoying
property of the ENCE, i.e. its proportionality to the square root of the number
of bins for well calibrated or nearly calibrated datasets. A similar behavior
affects the calibration error based on the variance of z-scores (ZVE), and in
both cases this property is a consequence of the use of a Mean Absolute
Deviation (MAD) statistic to estimate calibration errors. Hence, the question
arises of which number of bins to choose for a reliable estimation of
calibration error statistics. A solution is proposed to infer ENCE and ZVE
values that do not depend on the number of bins for datasets assumed to be
calibrated, providing simultaneously a statistical calibration test. It is also
shown that the ZVE is less sensitive than the ENCE to outstanding errors or
uncertainties.
Related papers
- Optimizing Estimators of Squared Calibration Errors in Classification [2.3020018305241337]
We propose a mean-squared error-based risk that enables the comparison and optimization of estimators of squared calibration errors.
Our approach advocates for a training-validation-testing pipeline when estimating a calibration error.
arXiv Detail & Related papers (2024-10-09T15:58:06Z) - Orthogonal Causal Calibration [55.28164682911196]
We prove generic upper bounds on the calibration error of any causal parameter estimate $theta$ with respect to any loss $ell$.
We use our bound to analyze the convergence of two sample splitting algorithms for causal calibration.
arXiv Detail & Related papers (2024-06-04T03:35:25Z) - Towards Certification of Uncertainty Calibration under Adversarial Attacks [96.48317453951418]
We show that attacks can significantly harm calibration, and thus propose certified calibration as worst-case bounds on calibration under adversarial perturbations.
We propose novel calibration attacks and demonstrate how they can improve model calibration through textitadversarial calibration training
arXiv Detail & Related papers (2024-05-22T18:52:09Z) - Validation of ML-UQ calibration statistics using simulated reference values: a sensitivity analysis [0.0]
Some popular Machine Learning Uncertainty Quantification (ML-UQ) calibration statistics do not have predefined reference values.
Simulated reference values, based on synthetic calibrated datasets derived from actual uncertainties, have been proposed to palliate this problem.
This study explores various facets of this problem, and shows that some statistics are excessively sensitive to the choice of generative distribution to be used for validation.
arXiv Detail & Related papers (2024-03-01T10:19:32Z) - Negative impact of heavy-tailed uncertainty and error distributions on the reliability of calibration statistics for machine learning regression tasks [0.0]
It is shown that the estimation of MV, MSE and their confidence intervals becomes unreliable for heavy-tailed uncertainty and error distributions.
The same problem is expected to affect also conditional calibrations statistics, such as the popular ENCE.
arXiv Detail & Related papers (2024-02-15T16:05:35Z) - Consistent and Asymptotically Unbiased Estimation of Proper Calibration
Errors [23.819464242327257]
We propose a method that allows consistent estimation of all proper calibration errors and refinement terms.
We prove the relation between refinement and f-divergences, which implies information monotonicity in neural networks.
Our experiments validate the claimed properties of the proposed estimator and suggest that the selection of a post-hoc calibration method should be determined by the particular calibration error of interest.
arXiv Detail & Related papers (2023-12-14T01:20:08Z) - Calibration by Distribution Matching: Trainable Kernel Calibration
Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression.
These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization.
We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z) - Stratification of uncertainties recalibrated by isotonic regression and
its impact on calibration error statistics [0.0]
Recalibration of prediction uncertainties by isotonic regression might present a problem for bin-based calibration error statistics.
I show on an example how this might significantly affect the calibration diagnostics.
arXiv Detail & Related papers (2023-06-08T13:24:39Z) - T-Cal: An optimal test for the calibration of predictive models [49.11538724574202]
We consider detecting mis-calibration of predictive models using a finite validation dataset as a hypothesis testing problem.
detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently smooth functions of the predictions.
We propose T-Cal, a minimax test for calibration based on a de-biased plug-in estimator of the $ell$-Expected Error (ECE)
arXiv Detail & Related papers (2022-03-03T16:58:54Z) - Localized Calibration: Metrics and Recalibration [133.07044916594361]
We propose a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration.
We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods.
arXiv Detail & Related papers (2021-02-22T07:22:12Z) - Calibration of Neural Networks using Splines [51.42640515410253]
Measuring calibration error amounts to comparing two empirical distributions.
We introduce a binning-free calibration measure inspired by the classical Kolmogorov-Smirnov (KS) statistical test.
Our method consistently outperforms existing methods on KS error as well as other commonly used calibration measures.
arXiv Detail & Related papers (2020-06-23T07:18:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.