Related papers: Truthfulness of Decision-Theoretic Calibration Measures

Truthfulness of Decision-Theoretic Calibration Measures

URL: http://arxiv.org/abs/2503.02384v1
Date: Tue, 04 Mar 2025 08:20:10 GMT
Title: Truthfulness of Decision-Theoretic Calibration Measures
Authors: Mingda Qiao, Eric Zhao,
Abstract summary: We introduce a new calibration measure termed subsampled step calibration, $mathsfStepCEtextsfsub$, that is both decision-theoretic and truthful.<n>In particular, on any product distribution, $mathsfStepCEtextsfsub$ is truthful up to an $O(1)$ factor whereas prior decision-theoretic calibration measures suffer from an $e-Omega(T)$-$Omega(sqrtT)$ truthfulness gap.
Score: 5.414308305392762
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Calibration measures quantify how much a forecaster's predictions violates calibration, which requires that forecasts are unbiased conditioning on the forecasted probabilities. Two important desiderata for a calibration measure are its decision-theoretic implications (i.e., downstream decision-makers that best-respond to the forecasts are always no-regret) and its truthfulness (i.e., a forecaster approximately minimizes error by always reporting the true probabilities). Existing measures satisfy at most one of the properties, but not both. We introduce a new calibration measure termed subsampled step calibration, $\mathsf{StepCE}^{\textsf{sub}}$, that is both decision-theoretic and truthful. In particular, on any product distribution, $\mathsf{StepCE}^{\textsf{sub}}$ is truthful up to an $O(1)$ factor whereas prior decision-theoretic calibration measures suffer from an $e^{-\Omega(T)}$-$\Omega(\sqrt{T})$ truthfulness gap. Moreover, in any smoothed setting where the conditional probability of each event is perturbed by a noise of magnitude $c > 0$, $\mathsf{StepCE}^{\textsf{sub}}$ is truthful up to an $O(\sqrt{\log(1/c)})$ factor, while prior decision-theoretic measures have an $e^{-\Omega(T)}$-$\Omega(T^{1/3})$ truthfulness gap. We also prove a general impossibility result for truthful decision-theoretic forecasting: any complete and decision-theoretic calibration measure must be discontinuous and non-truthful in the non-smoothed setting.

Related papers

Computational-Statistical Tradeoffs at the Next-Token Prediction Barrier: Autoregressive and Imitation Learning under Misspecification [50.717692060500696]
Next-token prediction with the logarithmic loss is a cornerstone of autoregressive sequence modeling.<n>Next-token prediction can be made robust so as to achieve $C=tilde O(H)$, representing moderate error amplification.<n>No computationally efficient algorithm can achieve sub-polynomial approximation factor $C=e(log H)1-Omega(1)$.
arXiv Detail & Related papers (2025-02-18T02:52:00Z)
Orthogonal Causal Calibration [55.28164682911196]
We prove generic upper bounds on the calibration error of any causal parameter estimate $theta$ with respect to any loss $ell$. We use our bound to analyze the convergence of two sample splitting algorithms for causal calibration.
arXiv Detail & Related papers (2024-06-04T03:35:25Z)
Mind the Gap: A Causal Perspective on Bias Amplification in Prediction & Decision-Making [58.06306331390586]
We introduce the notion of a margin complement, which measures how much a prediction score $S$ changes due to a thresholding operation. We show that under suitable causal assumptions, the influences of $X$ on the prediction score $S$ are equal to the influences of $X$ on the true outcome $Y$.
arXiv Detail & Related papers (2024-05-24T11:22:19Z)
An Elementary Predictor Obtaining $2\sqrt{T}+1$ Distance to Calibration [4.628072661683411]
We show that an online predictor can obtain $O(sqrtT)$ distance to calibration in the adversarial setting. We give an extremely simple, efficient, deterministic algorithm that obtains distance to calibration error at most $2sqrtT+1$.
arXiv Detail & Related papers (2024-02-18T00:53:05Z)
On the Distance from Calibration in Sequential Prediction [4.14360329494344]
We study a sequential binary prediction setting where the forecaster is evaluated in terms of the calibration distance. The calibration distance is a natural and intuitive measure of deviation from perfect calibration. We prove that there is a forecasting algorithm that achieves an $O(sqrtT)$ calibration distance in expectation on an adversarially chosen sequence of $T$ binary outcomes.
arXiv Detail & Related papers (2024-02-12T07:37:19Z)
A Consistent and Differentiable Lp Canonical Calibration Error Estimator [21.67616079217758]
Deep neural networks are poorly calibrated and tend to output overconfident predictions. We propose a low-bias, trainable calibration error estimator based on Dirichlet kernel density estimates. Our method has a natural choice of kernel, and can be used to generate consistent estimates of other quantities.
arXiv Detail & Related papers (2022-10-13T15:11:11Z)
Faster online calibration without randomization: interval forecasts and the power of two choices [43.17917448937131]
We study the problem of making calibrated probabilistic forecasts for a binary sequence generated by an adversarial nature. Inspired by the works on the "power of two choices" and imprecise probability theory, we study a small variant of the standard online calibration problem.
arXiv Detail & Related papers (2022-04-27T17:33:23Z)
T-Cal: An optimal test for the calibration of predictive models [49.11538724574202]
We consider detecting mis-calibration of predictive models using a finite validation dataset as a hypothesis testing problem. detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently smooth functions of the predictions. We propose T-Cal, a minimax test for calibration based on a de-biased plug-in estimator of the $ell$-Expected Error (ECE)
arXiv Detail & Related papers (2022-03-03T16:58:54Z)
Localized Calibration: Metrics and Recalibration [133.07044916594361]
We propose a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration. We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods.
arXiv Detail & Related papers (2021-02-22T07:22:12Z)
Stronger Calibration Lower Bounds via Sidestepping [18.383889039268222]
We consider an online binary prediction setting where a forecaster observes a sequence of $T$ bits one by one. The forecaster is called well-calibrated if for each $p in [0, 1]$, among the $n_p$ bits for which the forecaster predicts probability $p$, is indeed equal to $p cdot n_p$. The calibration error, defined as $sum_p |m_p - p n_p|$, quantifies the extent to which the forecaster deviates from being well-calibrated
arXiv Detail & Related papers (2020-12-07T05:29:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.