Can a calibration metric be both testable and actionable?
- URL: http://arxiv.org/abs/2502.19851v1
- Date: Thu, 27 Feb 2025 07:50:24 GMT
- Title: Can a calibration metric be both testable and actionable?
- Authors: Raphael Rossellini, Jake A. Soloff, Rina Foygel Barber, Zhimei Ren, Rebecca Willett,
- Abstract summary: We show that Cutoff Error is both testable and over and this gap by assessing actionable calibration intervals.<n>We show that Cutoff Error is both testable and over and this gap by assessing its implications for popular post-hoc calibration methods.
- Score: 11.056629967114272
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Forecast probabilities often serve as critical inputs for binary decision making. In such settings, calibration$\unicode{x2014}$ensuring forecasted probabilities match empirical frequencies$\unicode{x2014}$is essential. Although the common notion of Expected Calibration Error (ECE) provides actionable insights for decision making, it is not testable: it cannot be empirically estimated in many practical cases. Conversely, the recently proposed Distance from Calibration (dCE) is testable but is not actionable since it lacks decision-theoretic guarantees needed for high-stakes applications. We introduce Cutoff Calibration Error, a calibration measure that bridges this gap by assessing calibration over intervals of forecasted probabilities. We show that Cutoff Calibration Error is both testable and actionable and examine its implications for popular post-hoc calibration methods, such as isotonic regression and Platt scaling.
Related papers
- Smooth Calibration and Decision Making [11.51844809748468]
We show that post-processing an online predictor with $eps$ to calibration achieves $O(sqrtepsilon)$ ECE and CDL.
The optimal bound is non-optimal compared with existing online calibration algorithms.
arXiv Detail & Related papers (2025-04-22T04:55:41Z) - Orthogonal Causal Calibration [55.28164682911196]
We prove generic upper bounds on the calibration error of any causal parameter estimate $theta$ with respect to any loss $ell$.
We use our bound to analyze the convergence of two sample splitting algorithms for causal calibration.
arXiv Detail & Related papers (2024-06-04T03:35:25Z) - Calibration by Distribution Matching: Trainable Kernel Calibration
Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression.
These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization.
We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z) - Calibration Error Estimation Using Fuzzy Binning [0.0]
We propose a Fuzzy Error metric (FCE) that utilizes a fuzzy binning approach to calculate calibration error.
Our results show that FCE offers better calibration error estimation, especially in multi-class settings.
arXiv Detail & Related papers (2023-04-30T18:06:14Z) - Posterior Probability Matters: Doubly-Adaptive Calibration for Neural Predictions in Online Advertising [29.80454356173723]
Field-level calibration is fine-grained and more practical.
AdaCalib learns an isotonic function family to calibrate model predictions.
Experiments verify that AdaCalib achieves significant improvement on calibration performance.
arXiv Detail & Related papers (2022-05-15T14:27:19Z) - T-Cal: An optimal test for the calibration of predictive models [49.11538724574202]
We consider detecting mis-calibration of predictive models using a finite validation dataset as a hypothesis testing problem.
detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently smooth functions of the predictions.
We propose T-Cal, a minimax test for calibration based on a de-biased plug-in estimator of the $ell$-Expected Error (ECE)
arXiv Detail & Related papers (2022-03-03T16:58:54Z) - Hidden Heterogeneity: When to Choose Similarity-Based Calibration [12.788224825185633]
Black-box calibration methods are unable to detect subpopulations where calibration could improve prediction accuracy.
The paper proposes a quantitative measure for hidden heterogeneity (HH)
Experiments show that the improvements in calibration achieved by similarity-based calibration methods correlate with the amount of HH present and, given sufficient calibration data, generally exceed calibration achieved by global methods.
arXiv Detail & Related papers (2022-02-03T20:43:25Z) - Localized Calibration: Metrics and Recalibration [133.07044916594361]
We propose a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration.
We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods.
arXiv Detail & Related papers (2021-02-22T07:22:12Z) - Unsupervised Calibration under Covariate Shift [92.02278658443166]
We introduce the problem of calibration under domain shift and propose an importance sampling based approach to address it.
We evaluate and discuss the efficacy of our method on both real-world datasets and synthetic datasets.
arXiv Detail & Related papers (2020-06-29T21:50:07Z) - Calibration of Neural Networks using Splines [51.42640515410253]
Measuring calibration error amounts to comparing two empirical distributions.
We introduce a binning-free calibration measure inspired by the classical Kolmogorov-Smirnov (KS) statistical test.
Our method consistently outperforms existing methods on KS error as well as other commonly used calibration measures.
arXiv Detail & Related papers (2020-06-23T07:18:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.