A Unifying Theory of Distance from Calibration
- URL: http://arxiv.org/abs/2211.16886v2
- Date: Fri, 31 Mar 2023 17:48:47 GMT
- Title: A Unifying Theory of Distance from Calibration
- Authors: Jaros{\l}aw B{\l}asiok, Parikshit Gopalan, Lunjia Hu, Preetum Nakkiran
- Abstract summary: There is no consensus on how to quantify the distance from perfect calibration.
We propose a ground-truth notion of distance from calibration, inspired by the literature on property testing.
Applying our framework, we identify three calibration measures that are consistent and can be estimated efficiently.
- Score: 9.959025631339982
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the fundamental question of how to define and measure the distance
from calibration for probabilistic predictors. While the notion of perfect
calibration is well-understood, there is no consensus on how to quantify the
distance from perfect calibration. Numerous calibration measures have been
proposed in the literature, but it is unclear how they compare to each other,
and many popular measures such as Expected Calibration Error (ECE) fail to
satisfy basic properties like continuity.
We present a rigorous framework for analyzing calibration measures, inspired
by the literature on property testing. We propose a ground-truth notion of
distance from calibration: the $\ell_1$ distance to the nearest perfectly
calibrated predictor. We define a consistent calibration measure as one that is
polynomially related to this distance. Applying our framework, we identify
three calibration measures that are consistent and can be estimated
efficiently: smooth calibration, interval calibration, and Laplace kernel
calibration. The former two give quadratic approximations to the ground truth
distance, which we show is information-theoretically optimal in a natural model
for measuring calibration which we term the prediction-only access model. Our
work thus establishes fundamental lower and upper bounds on measuring the
distance to calibration, and also provides theoretical justification for
preferring certain metrics (like Laplace kernel calibration) in practice.
Related papers
- A Confidence Interval for the $\ell_2$ Expected Calibration Error [35.88784957918326]
We develop confidence intervals $ell$ Expected the Error (ECE)
We consider top-1-to-$k$ calibration, which includes both the popular notion of confidence calibration as well as calibration.
For a debiased estimator of the ECE, we show normality, but with different convergence rates and variances for calibrated and misd models.
arXiv Detail & Related papers (2024-08-16T20:00:08Z) - Truthfulness of Calibration Measures [18.21682539787221]
A calibration measure is said to be truthful if the forecaster minimizes expected penalty by predicting the conditional expectation of the next outcome.
This makes it an essential desideratum for calibration measures, alongside typical requirements, such as soundness and completeness.
We introduce a new calibration measure termed the Subsampled Smooth Error (SSCE) under which truthful prediction is optimal up to a constant multiplicative factor.
arXiv Detail & Related papers (2024-07-19T02:07:55Z) - On the Distance from Calibration in Sequential Prediction [4.14360329494344]
We study a sequential binary prediction setting where the forecaster is evaluated in terms of the calibration distance.
The calibration distance is a natural and intuitive measure of deviation from perfect calibration.
We prove that there is a forecasting algorithm that achieves an $O(sqrtT)$ calibration distance in expectation on an adversarially chosen sequence of $T$ binary outcomes.
arXiv Detail & Related papers (2024-02-12T07:37:19Z) - Calibration by Distribution Matching: Trainable Kernel Calibration
Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression.
These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization.
We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z) - Adaptive Calibrator Ensemble for Model Calibration under Distribution
Shift [23.794897699193875]
adaptive calibrator ensemble (ACE) calibrates OOD datasets whose difficulty is usually higher than the calibration set.
ACE generally improves the performance of a few state-of-the-art calibration schemes on a series of OOD benchmarks.
arXiv Detail & Related papers (2023-03-09T15:22:02Z) - On Calibrating Semantic Segmentation Models: Analyses and An Algorithm [51.85289816613351]
We study the problem of semantic segmentation calibration.
Model capacity, crop size, multi-scale testing, and prediction correctness have impact on calibration.
We propose a simple, unifying, and effective approach, namely selective scaling.
arXiv Detail & Related papers (2022-12-22T22:05:16Z) - T-Cal: An optimal test for the calibration of predictive models [49.11538724574202]
We consider detecting mis-calibration of predictive models using a finite validation dataset as a hypothesis testing problem.
detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently smooth functions of the predictions.
We propose T-Cal, a minimax test for calibration based on a de-biased plug-in estimator of the $ell$-Expected Error (ECE)
arXiv Detail & Related papers (2022-03-03T16:58:54Z) - Localized Calibration: Metrics and Recalibration [133.07044916594361]
We propose a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration.
We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods.
arXiv Detail & Related papers (2021-02-22T07:22:12Z) - Unsupervised Calibration under Covariate Shift [92.02278658443166]
We introduce the problem of calibration under domain shift and propose an importance sampling based approach to address it.
We evaluate and discuss the efficacy of our method on both real-world datasets and synthetic datasets.
arXiv Detail & Related papers (2020-06-29T21:50:07Z) - Calibration of Pre-trained Transformers [55.57083429195445]
We focus on BERT and RoBERTa in this work, and analyze their calibration across three tasks: natural language inference, paraphrase detection, and commonsense reasoning.
We show that: (1) when used out-of-the-box, pre-trained models are calibrated in-domain, and compared to baselines, their calibration error out-of-domain can be as much as 3.5x lower; (2) temperature scaling is effective at further reducing calibration error in-domain, and using label smoothing to deliberately increase empirical uncertainty helps calibrate posteriors out-of-domain.
arXiv Detail & Related papers (2020-03-17T18:58:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.