Truthfulness of Decision-Theoretic Calibration Measures
- URL: http://arxiv.org/abs/2503.02384v1
- Date: Tue, 04 Mar 2025 08:20:10 GMT
- Title: Truthfulness of Decision-Theoretic Calibration Measures
- Authors: Mingda Qiao, Eric Zhao,
- Abstract summary: We introduce a new calibration measure termed subsampled step calibration, $mathsfStepCEtextsfsub$, that is both decision-theoretic and truthful.<n>In particular, on any product distribution, $mathsfStepCEtextsfsub$ is truthful up to an $O(1)$ factor whereas prior decision-theoretic calibration measures suffer from an $e-Omega(T)$-$Omega(sqrtT)$ truthfulness gap.
- Score: 5.414308305392762
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Calibration measures quantify how much a forecaster's predictions violates calibration, which requires that forecasts are unbiased conditioning on the forecasted probabilities. Two important desiderata for a calibration measure are its decision-theoretic implications (i.e., downstream decision-makers that best-respond to the forecasts are always no-regret) and its truthfulness (i.e., a forecaster approximately minimizes error by always reporting the true probabilities). Existing measures satisfy at most one of the properties, but not both. We introduce a new calibration measure termed subsampled step calibration, $\mathsf{StepCE}^{\textsf{sub}}$, that is both decision-theoretic and truthful. In particular, on any product distribution, $\mathsf{StepCE}^{\textsf{sub}}$ is truthful up to an $O(1)$ factor whereas prior decision-theoretic calibration measures suffer from an $e^{-\Omega(T)}$-$\Omega(\sqrt{T})$ truthfulness gap. Moreover, in any smoothed setting where the conditional probability of each event is perturbed by a noise of magnitude $c > 0$, $\mathsf{StepCE}^{\textsf{sub}}$ is truthful up to an $O(\sqrt{\log(1/c)})$ factor, while prior decision-theoretic measures have an $e^{-\Omega(T)}$-$\Omega(T^{1/3})$ truthfulness gap. We also prove a general impossibility result for truthful decision-theoretic forecasting: any complete and decision-theoretic calibration measure must be discontinuous and non-truthful in the non-smoothed setting.
Related papers
- Computational-Statistical Tradeoffs at the Next-Token Prediction Barrier: Autoregressive and Imitation Learning under Misspecification [50.717692060500696]
Next-token prediction with the logarithmic loss is a cornerstone of autoregressive sequence modeling.<n>Next-token prediction can be made robust so as to achieve $C=tilde O(H)$, representing moderate error amplification.<n>No computationally efficient algorithm can achieve sub-polynomial approximation factor $C=e(log H)1-Omega(1)$.
arXiv Detail & Related papers (2025-02-18T02:52:00Z) - Orthogonal Causal Calibration [55.28164682911196]
We prove generic upper bounds on the calibration error of any causal parameter estimate $theta$ with respect to any loss $ell$.
We use our bound to analyze the convergence of two sample splitting algorithms for causal calibration.
arXiv Detail & Related papers (2024-06-04T03:35:25Z) - Mind the Gap: A Causal Perspective on Bias Amplification in Prediction & Decision-Making [58.06306331390586]
We introduce the notion of a margin complement, which measures how much a prediction score $S$ changes due to a thresholding operation.
We show that under suitable causal assumptions, the influences of $X$ on the prediction score $S$ are equal to the influences of $X$ on the true outcome $Y$.
arXiv Detail & Related papers (2024-05-24T11:22:19Z) - An Elementary Predictor Obtaining $2\sqrt{T}+1$ Distance to Calibration [4.628072661683411]
We show that an online predictor can obtain $O(sqrtT)$ distance to calibration in the adversarial setting.
We give an extremely simple, efficient, deterministic algorithm that obtains distance to calibration error at most $2sqrtT+1$.
arXiv Detail & Related papers (2024-02-18T00:53:05Z) - On the Distance from Calibration in Sequential Prediction [4.14360329494344]
We study a sequential binary prediction setting where the forecaster is evaluated in terms of the calibration distance.
The calibration distance is a natural and intuitive measure of deviation from perfect calibration.
We prove that there is a forecasting algorithm that achieves an $O(sqrtT)$ calibration distance in expectation on an adversarially chosen sequence of $T$ binary outcomes.
arXiv Detail & Related papers (2024-02-12T07:37:19Z) - A Consistent and Differentiable Lp Canonical Calibration Error Estimator [21.67616079217758]
Deep neural networks are poorly calibrated and tend to output overconfident predictions.
We propose a low-bias, trainable calibration error estimator based on Dirichlet kernel density estimates.
Our method has a natural choice of kernel, and can be used to generate consistent estimates of other quantities.
arXiv Detail & Related papers (2022-10-13T15:11:11Z) - Faster online calibration without randomization: interval forecasts and
the power of two choices [43.17917448937131]
We study the problem of making calibrated probabilistic forecasts for a binary sequence generated by an adversarial nature.
Inspired by the works on the "power of two choices" and imprecise probability theory, we study a small variant of the standard online calibration problem.
arXiv Detail & Related papers (2022-04-27T17:33:23Z) - T-Cal: An optimal test for the calibration of predictive models [49.11538724574202]
We consider detecting mis-calibration of predictive models using a finite validation dataset as a hypothesis testing problem.
detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently smooth functions of the predictions.
We propose T-Cal, a minimax test for calibration based on a de-biased plug-in estimator of the $ell$-Expected Error (ECE)
arXiv Detail & Related papers (2022-03-03T16:58:54Z) - Localized Calibration: Metrics and Recalibration [133.07044916594361]
We propose a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration.
We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods.
arXiv Detail & Related papers (2021-02-22T07:22:12Z) - Stronger Calibration Lower Bounds via Sidestepping [18.383889039268222]
We consider an online binary prediction setting where a forecaster observes a sequence of $T$ bits one by one.
The forecaster is called well-calibrated if for each $p in [0, 1]$, among the $n_p$ bits for which the forecaster predicts probability $p$, is indeed equal to $p cdot n_p$.
The calibration error, defined as $sum_p |m_p - p n_p|$, quantifies the extent to which the forecaster deviates from being well-calibrated
arXiv Detail & Related papers (2020-12-07T05:29:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.