On the Distance from Calibration in Sequential Prediction
- URL: http://arxiv.org/abs/2402.07458v2
- Date: Mon, 27 May 2024 05:25:05 GMT
- Title: On the Distance from Calibration in Sequential Prediction
- Authors: Mingda Qiao, Letian Zheng,
- Abstract summary: We study a sequential binary prediction setting where the forecaster is evaluated in terms of the calibration distance.
The calibration distance is a natural and intuitive measure of deviation from perfect calibration.
We prove that there is a forecasting algorithm that achieves an $O(sqrtT)$ calibration distance in expectation on an adversarially chosen sequence of $T$ binary outcomes.
- Score: 4.14360329494344
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a sequential binary prediction setting where the forecaster is evaluated in terms of the calibration distance, which is defined as the $L_1$ distance between the predicted values and the set of predictions that are perfectly calibrated in hindsight. This is analogous to a calibration measure recently proposed by B{\l}asiok, Gopalan, Hu and Nakkiran (STOC 2023) for the offline setting. The calibration distance is a natural and intuitive measure of deviation from perfect calibration, and satisfies a Lipschitz continuity property which does not hold for many popular calibration measures, such as the $L_1$ calibration error and its variants. We prove that there is a forecasting algorithm that achieves an $O(\sqrt{T})$ calibration distance in expectation on an adversarially chosen sequence of $T$ binary outcomes. At the core of this upper bound is a structural result showing that the calibration distance is accurately approximated by the lower calibration distance, which is a continuous relaxation of the former. We then show that an $O(\sqrt{T})$ lower calibration distance can be achieved via a simple minimax argument and a reduction to online learning on a Lipschitz class. On the lower bound side, an $\Omega(T^{1/3})$ calibration distance is shown to be unavoidable, even when the adversary outputs a sequence of independent random bits, and has an additional ability to early stop (i.e., to stop producing random bits and output the same bit in the remaining steps). Interestingly, without this early stopping, the forecaster can achieve a much smaller calibration distance of $\mathrm{polylog}(T)$.
Related papers
- Breaking the $T^{2/3}$ Barrier for Sequential Calibration [26.563792462828726]
We study the problem of online calibrated forecasting of binary sequences.
Foster & Vohra (1998) derived an algorithm with $O(T2/3)$ calibration error after $T$ time steps, and showed a lower bound of $Omega(T1/2)$.
Qiao & Valiant (2021) improved the lower bound to $Omega(T0.528)$ by introducing a game called sign preservation and showing that lower bounds for this game imply lower bounds for calibration.
arXiv Detail & Related papers (2024-06-19T16:19:39Z) - Orthogonal Causal Calibration [55.28164682911196]
We prove generic upper bounds on the calibration error of any causal parameter estimate $theta$ with respect to any loss $ell$.
We use our bound to analyze the convergence of two sample splitting algorithms for causal calibration.
arXiv Detail & Related papers (2024-06-04T03:35:25Z) - Testing Calibration in Nearly-Linear Time [14.099477870728595]
We focus on the algorithmic study of calibration through the lens of property testing.
We make the simple observation that the empirical smooth calibration linear program can be reformulated as an instance of minimum-cost flow on a highly-structured graph.
We present experiments showing the testing problem we define faithfully captures standard notions of calibration, and that our algorithms scale efficiently to accommodate large sample sizes.
arXiv Detail & Related papers (2024-02-20T17:53:24Z) - An Elementary Predictor Obtaining $2\sqrt{T}+1$ Distance to Calibration [4.628072661683411]
We show that an online predictor can obtain $O(sqrtT)$ distance to calibration in the adversarial setting.
We give an extremely simple, efficient, deterministic algorithm that obtains distance to calibration error at most $2sqrtT+1$.
arXiv Detail & Related papers (2024-02-18T00:53:05Z) - Conformal Nucleus Sampling [67.5232384936661]
We assess whether a top-$p$ set is indeed aligned with its probabilistic meaning in various linguistic contexts.
We find that OPT models are overconfident, and that calibration shows a moderate inverse scaling with model size.
arXiv Detail & Related papers (2023-05-04T08:11:57Z) - A Unifying Theory of Distance from Calibration [9.959025631339982]
There is no consensus on how to quantify the distance from perfect calibration.
We propose a ground-truth notion of distance from calibration, inspired by the literature on property testing.
Applying our framework, we identify three calibration measures that are consistent and can be estimated efficiently.
arXiv Detail & Related papers (2022-11-30T10:38:24Z) - A Close Look into the Calibration of Pre-trained Language Models [56.998539510508515]
Pre-trained language models (PLMs) may fail in giving reliable estimates of their predictive uncertainty.
We study the dynamic change in PLMs' calibration performance in training.
We extend two recently proposed learnable methods that directly collect data to train models to have reasonable confidence estimations.
arXiv Detail & Related papers (2022-10-31T21:31:07Z) - Sample-dependent Adaptive Temperature Scaling for Improved Calibration [95.7477042886242]
Post-hoc approach to compensate for neural networks being wrong is to perform temperature scaling.
We propose to predict a different temperature value for each input, allowing us to adjust the mismatch between confidence and accuracy.
We test our method on the ResNet50 and WideResNet28-10 architectures using the CIFAR10/100 and Tiny-ImageNet datasets.
arXiv Detail & Related papers (2022-07-13T14:13:49Z) - T-Cal: An optimal test for the calibration of predictive models [49.11538724574202]
We consider detecting mis-calibration of predictive models using a finite validation dataset as a hypothesis testing problem.
detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently smooth functions of the predictions.
We propose T-Cal, a minimax test for calibration based on a de-biased plug-in estimator of the $ell$-Expected Error (ECE)
arXiv Detail & Related papers (2022-03-03T16:58:54Z) - Localized Calibration: Metrics and Recalibration [133.07044916594361]
We propose a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration.
We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods.
arXiv Detail & Related papers (2021-02-22T07:22:12Z) - Stronger Calibration Lower Bounds via Sidestepping [18.383889039268222]
We consider an online binary prediction setting where a forecaster observes a sequence of $T$ bits one by one.
The forecaster is called well-calibrated if for each $p in [0, 1]$, among the $n_p$ bits for which the forecaster predicts probability $p$, is indeed equal to $p cdot n_p$.
The calibration error, defined as $sum_p |m_p - p n_p|$, quantifies the extent to which the forecaster deviates from being well-calibrated
arXiv Detail & Related papers (2020-12-07T05:29:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.