Stratification of uncertainties recalibrated by isotonic regression and
its impact on calibration error statistics
- URL: http://arxiv.org/abs/2306.05180v1
- Date: Thu, 8 Jun 2023 13:24:39 GMT
- Title: Stratification of uncertainties recalibrated by isotonic regression and
its impact on calibration error statistics
- Authors: Pascal Pernot
- Abstract summary: Recalibration of prediction uncertainties by isotonic regression might present a problem for bin-based calibration error statistics.
I show on an example how this might significantly affect the calibration diagnostics.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Abstract Post hoc recalibration of prediction uncertainties of machine
learning regression problems by isotonic regression might present a problem for
bin-based calibration error statistics (e.g. ENCE). Isotonic regression often
produces stratified uncertainties, i.e. subsets of uncertainties with identical
numerical values. Partitioning of the resulting data into equal-sized bins
introduces an aleatoric component to the estimation of bin-based calibration
statistics. The partitioning of stratified data into bins depends on the order
of the data, which is typically an uncontrolled property of calibration
test/validation sets. The tie-braking method of the ordering algorithm used for
binning might also introduce an aleatoric component. I show on an example how
this might significantly affect the calibration diagnostics.
Related papers
- Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.
We further extend our analysis to the case where the test point has non-trivial correlations with the training set, setting often encountered in time series forecasting.
We validate our theory across a variety of high dimensional data.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - Validation of ML-UQ calibration statistics using simulated reference values: a sensitivity analysis [0.0]
Some popular Machine Learning Uncertainty Quantification (ML-UQ) calibration statistics do not have predefined reference values.
Simulated reference values, based on synthetic calibrated datasets derived from actual uncertainties, have been proposed to palliate this problem.
This study explores various facets of this problem, and shows that some statistics are excessively sensitive to the choice of generative distribution to be used for validation.
arXiv Detail & Related papers (2024-03-01T10:19:32Z) - Calibration by Distribution Matching: Trainable Kernel Calibration
Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression.
These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization.
We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z) - Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - Properties of the ENCE and other MAD-based calibration metrics [0.0]
The Expected Normalized Error (ENCE) is a popular calibration statistic used in Machine Learning.
A similar behavior affects the calibration error based on the variance of z-scores (ZVE), and in both cases this property is a consequence of the use of a Mean Absolute Deviation (MAD) statistic to estimate calibration errors.
A solution is proposed to infer ENCE and ZVE values that do not depend on the number of bins for assumed datasets to be calibrated.
arXiv Detail & Related papers (2023-05-17T08:51:42Z) - Causal isotonic calibration for heterogeneous treatment effects [0.5249805590164901]
We propose causal isotonic calibration, a novel nonparametric method for calibrating predictors of heterogeneous treatment effects.
We also introduce cross-calibration, a data-efficient variant of calibration that eliminates the need for hold-out calibration sets.
arXiv Detail & Related papers (2023-02-27T18:07:49Z) - Parametric and Multivariate Uncertainty Calibration for Regression and
Object Detection [4.630093015127541]
We show that common detection models overestimate the spatial uncertainty in comparison to the observed error.
Our experiments show that the simple Isotonic Regression recalibration method is sufficient to achieve a good calibrated uncertainty.
In contrast, if normal distributions are required for subsequent processes, our GP-Normal recalibration method yields the best results.
arXiv Detail & Related papers (2022-07-04T08:00:20Z) - Unsupervised Calibration under Covariate Shift [92.02278658443166]
We introduce the problem of calibration under domain shift and propose an importance sampling based approach to address it.
We evaluate and discuss the efficacy of our method on both real-world datasets and synthetic datasets.
arXiv Detail & Related papers (2020-06-29T21:50:07Z) - Calibration of Neural Networks using Splines [51.42640515410253]
Measuring calibration error amounts to comparing two empirical distributions.
We introduce a binning-free calibration measure inspired by the classical Kolmogorov-Smirnov (KS) statistical test.
Our method consistently outperforms existing methods on KS error as well as other commonly used calibration measures.
arXiv Detail & Related papers (2020-06-23T07:18:05Z) - Individual Calibration with Randomized Forecasting [116.2086707626651]
We show that calibration for individual samples is possible in the regression setup if the predictions are randomized.
We design a training objective to enforce individual calibration and use it to train randomized regression functions.
arXiv Detail & Related papers (2020-06-18T05:53:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.