Related papers: An Elementary Predictor Obtaining $2\sqrt{T}$ Distance to Calibration

Related papers

Smooth Calibration and Decision Making [11.51844809748468]
We show that post-processing an online predictor with $eps$ to calibration achieves $O(sqrtepsilon)$ ECE and CDL. The optimal bound is non-optimal compared with existing online calibration algorithms.
arXiv Detail & Related papers (2025-04-22T04:55:41Z)
Truthfulness of Decision-Theoretic Calibration Measures [5.414308305392762]
We introduce a new calibration measure termed subsampled step calibration, $mathsfStepCEtextsfsub$, that is both decision-theoretic and truthful. In particular, on any product distribution, $mathsfStepCEtextsfsub$ is truthful up to an $O(1)$ factor whereas prior decision-theoretic calibration measures suffer from an $e-Omega(T)$-$Omega(sqrtT)$ truthfulness gap.
arXiv Detail & Related papers (2025-03-04T08:20:10Z)
Breaking the $T^{2/3}$ Barrier for Sequential Calibration [26.563792462828726]
We study the problem of online calibrated forecasting of binary sequences. Foster & Vohra (1998) derived an algorithm with $O(T2/3)$ calibration error after $T$ time steps, and showed a lower bound of $Omega(T1/2)$. Qiao & Valiant (2021) improved the lower bound to $Omega(T0.528)$ by introducing a game called sign preservation and showing that lower bounds for this game imply lower bounds for calibration.
arXiv Detail & Related papers (2024-06-19T16:19:39Z)
Orthogonal Causal Calibration [55.28164682911196]
We prove generic upper bounds on the calibration error of any causal parameter estimate $theta$ with respect to any loss $ell$. We use our bound to analyze the convergence of two sample splitting algorithms for causal calibration.
arXiv Detail & Related papers (2024-06-04T03:35:25Z)
Testing Calibration in Nearly-Linear Time [14.099477870728595]
We focus on the algorithmic study of calibration through the lens of property testing. We make the simple observation that the empirical smooth calibration linear program can be reformulated as an instance of minimum-cost flow on a highly-structured graph. We present experiments showing the testing problem we define faithfully captures standard notions of calibration, and that our algorithms scale efficiently to accommodate large sample sizes.
arXiv Detail & Related papers (2024-02-20T17:53:24Z)
On the Distance from Calibration in Sequential Prediction [4.14360329494344]
We study a sequential binary prediction setting where the forecaster is evaluated in terms of the calibration distance. The calibration distance is a natural and intuitive measure of deviation from perfect calibration. We prove that there is a forecasting algorithm that achieves an $O(sqrtT)$ calibration distance in expectation on an adversarially chosen sequence of $T$ binary outcomes.
arXiv Detail & Related papers (2024-02-12T07:37:19Z)
A Unifying Theory of Distance from Calibration [9.959025631339982]
There is no consensus on how to quantify the distance from perfect calibration. We propose a ground-truth notion of distance from calibration, inspired by the literature on property testing. Applying our framework, we identify three calibration measures that are consistent and can be estimated efficiently.
arXiv Detail & Related papers (2022-11-30T10:38:24Z)
A Consistent and Differentiable Lp Canonical Calibration Error Estimator [21.67616079217758]
Deep neural networks are poorly calibrated and tend to output overconfident predictions. We propose a low-bias, trainable calibration error estimator based on Dirichlet kernel density estimates. Our method has a natural choice of kernel, and can be used to generate consistent estimates of other quantities.
arXiv Detail & Related papers (2022-10-13T15:11:11Z)
T-Cal: An optimal test for the calibration of predictive models [49.11538724574202]
We consider detecting mis-calibration of predictive models using a finite validation dataset as a hypothesis testing problem. detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently smooth functions of the predictions. We propose T-Cal, a minimax test for calibration based on a de-biased plug-in estimator of the $ell$-Expected Error (ECE)
arXiv Detail & Related papers (2022-03-03T16:58:54Z)
Localized Calibration: Metrics and Recalibration [133.07044916594361]
We propose a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration. We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods.
arXiv Detail & Related papers (2021-02-22T07:22:12Z)
Optimal network online change point localisation [73.93301212629231]
We study the problem of online network change point detection. In this setting, a collection of independent Bernoulli networks is collected sequentially, and the underlying change point occurs. The goal is to detect the change point as quickly as possible, if it exists, subject to a constraint on the number or probability of false alarms.
arXiv Detail & Related papers (2021-01-14T07:24:39Z)
Uncertainty Quantification and Deep Ensembles [79.4957965474334]
We show that deep-ensembles do not necessarily lead to improved calibration properties. We show that standard ensembling methods, when used in conjunction with modern techniques such as mixup regularization, can lead to less calibrated models. This text examines the interplay between three of the most simple and commonly used approaches to leverage deep learning when data is scarce.
arXiv Detail & Related papers (2020-07-17T07:32:24Z)
Transferable Calibration with Lower Bias and Variance in Domain Adaptation [139.4332115349543]
Domain Adaptation (DA) enables transferring a learning machine from a labeled source domain to an unlabeled target one. How to estimate the predictive uncertainty of DA models is vital for decision-making in safety-critical scenarios. TransCal can be easily applied to recalibrate existing DA methods.
arXiv Detail & Related papers (2020-07-16T11:09:36Z)
Calibration of Pre-trained Transformers [55.57083429195445]
We focus on BERT and RoBERTa in this work, and analyze their calibration across three tasks: natural language inference, paraphrase detection, and commonsense reasoning. We show that: (1) when used out-of-the-box, pre-trained models are calibrated in-domain, and compared to baselines, their calibration error out-of-domain can be as much as 3.5x lower; (2) temperature scaling is effective at further reducing calibration error in-domain, and using label smoothing to deliberately increase empirical uncertainty helps calibrate posteriors out-of-domain.
arXiv Detail & Related papers (2020-03-17T18:58:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.