Truthfulness of Calibration Measures
- URL: http://arxiv.org/abs/2407.13979v2
- Date: Wed, 20 Nov 2024 22:41:47 GMT
- Title: Truthfulness of Calibration Measures
- Authors: Nika Haghtalab, Mingda Qiao, Kunhe Yang, Eric Zhao,
- Abstract summary: A calibration measure is said to be truthful if the forecaster minimizes expected penalty by predicting the conditional expectation of the next outcome.
This makes it an essential desideratum for calibration measures, alongside typical requirements, such as soundness and completeness.
We introduce a new calibration measure termed the Subsampled Smooth Error (SSCE) under which truthful prediction is optimal up to a constant multiplicative factor.
- Score: 18.21682539787221
- License:
- Abstract: We initiate the study of the truthfulness of calibration measures in sequential prediction. A calibration measure is said to be truthful if the forecaster (approximately) minimizes the expected penalty by predicting the conditional expectation of the next outcome, given the prior distribution of outcomes. Truthfulness is an important property of calibration measures, ensuring that the forecaster is not incentivized to exploit the system with deliberate poor forecasts. This makes it an essential desideratum for calibration measures, alongside typical requirements, such as soundness and completeness. We conduct a taxonomy of existing calibration measures and their truthfulness. Perhaps surprisingly, we find that all of them are far from being truthful. That is, under existing calibration measures, there are simple distributions on which a polylogarithmic (or even zero) penalty is achievable, while truthful prediction leads to a polynomial penalty. Our main contribution is the introduction of a new calibration measure termed the Subsampled Smooth Calibration Error (SSCE) under which truthful prediction is optimal up to a constant multiplicative factor.
Related papers
- Towards Certification of Uncertainty Calibration under Adversarial Attacks [96.48317453951418]
We show that attacks can significantly harm calibration, and thus propose certified calibration as worst-case bounds on calibration under adversarial perturbations.
We propose novel calibration attacks and demonstrate how they can improve model calibration through textitadversarial calibration training
arXiv Detail & Related papers (2024-05-22T18:52:09Z) - Calibration by Distribution Matching: Trainable Kernel Calibration
Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression.
These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization.
We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z) - Boldness-Recalibration for Binary Event Predictions [0.0]
Ideally, probability predictions are (i) well calibrated, (ii) accurate, and (iii) bold, i.e., spread out enough to be informative for decision making.
There is a fundamental tension between calibration and boldness, since calibration metrics can be high when predictions are overly cautious, i.e., non-bold.
The purpose of this work is to develop a Bayesian model selection-based approach to assess calibration, and a strategy for boldness-recalibration.
arXiv Detail & Related papers (2023-05-05T18:14:47Z) - On Calibrating Semantic Segmentation Models: Analyses and An Algorithm [51.85289816613351]
We study the problem of semantic segmentation calibration.
Model capacity, crop size, multi-scale testing, and prediction correctness have impact on calibration.
We propose a simple, unifying, and effective approach, namely selective scaling.
arXiv Detail & Related papers (2022-12-22T22:05:16Z) - A Unifying Theory of Distance from Calibration [9.959025631339982]
There is no consensus on how to quantify the distance from perfect calibration.
We propose a ground-truth notion of distance from calibration, inspired by the literature on property testing.
Applying our framework, we identify three calibration measures that are consistent and can be estimated efficiently.
arXiv Detail & Related papers (2022-11-30T10:38:24Z) - Forecast Hedging and Calibration [8.858351266850544]
We develop the concept of forecast hedging, which consists of choosing the forecasts so as to guarantee the expected track record can only improve.
This yields all the calibration results by the same simple argument while differentiating between them by the forecast-hedging tools used.
Additional contributions are an improved definition of continuous calibration, ensuing game dynamics that yield Nashlibria in the long run, and a new forecasting procedure for binary events that is simpler than all known such procedures.
arXiv Detail & Related papers (2022-10-13T16:48:25Z) - Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics [8.858351266850544]
We propose to smooth out the calibration score, which measures how good a forecaster is, by combining nearby forecasts.
We show that smooth calibration can be guaranteed by deterministic procedures.
arXiv Detail & Related papers (2022-10-13T16:34:55Z) - T-Cal: An optimal test for the calibration of predictive models [49.11538724574202]
We consider detecting mis-calibration of predictive models using a finite validation dataset as a hypothesis testing problem.
detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently smooth functions of the predictions.
We propose T-Cal, a minimax test for calibration based on a de-biased plug-in estimator of the $ell$-Expected Error (ECE)
arXiv Detail & Related papers (2022-03-03T16:58:54Z) - Localized Calibration: Metrics and Recalibration [133.07044916594361]
We propose a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration.
We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods.
arXiv Detail & Related papers (2021-02-22T07:22:12Z) - Unsupervised Calibration under Covariate Shift [92.02278658443166]
We introduce the problem of calibration under domain shift and propose an importance sampling based approach to address it.
We evaluate and discuss the efficacy of our method on both real-world datasets and synthetic datasets.
arXiv Detail & Related papers (2020-06-29T21:50:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.