Smooth Calibration and Decision Making
- URL: http://arxiv.org/abs/2504.15582v1
- Date: Tue, 22 Apr 2025 04:55:41 GMT
- Title: Smooth Calibration and Decision Making
- Authors: Jason Hartline, Yifan Wu, Yunran Yang,
- Abstract summary: We show that post-processing an online predictor with $eps$ to calibration achieves $O(sqrtepsilon)$ ECE and CDL.<n>The optimal bound is non-optimal compared with existing online calibration algorithms.
- Score: 11.51844809748468
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Calibration requires predictor outputs to be consistent with their Bayesian posteriors. For machine learning predictors that do not distinguish between small perturbations, calibration errors are continuous in predictions, e.g., smooth calibration error (Foster and Hart, 2018), Distance to Calibration (Blasiok et al., 2023a). On the contrary, decision-makers who use predictions make optimal decisions discontinuously in probabilistic space, experiencing loss from miscalibration discontinuously. Calibration errors for decision-making are thus discontinuous, e.g., Expected Calibration Error (Foster and Vohra, 1997), and Calibration Decision Loss (Hu and Wu, 2024). Thus, predictors with a low calibration error for machine learning may suffer a high calibration error for decision-making, i.e., they may not be trustworthy for decision-makers optimizing assuming their predictions are correct. It is natural to ask if post-processing a predictor with a low calibration error for machine learning is without loss to achieve a low calibration error for decision-making. In our paper, we show that post-processing an online predictor with $\epsilon$ distance to calibration achieves $O(\sqrt{\epsilon})$ ECE and CDL, which is asymptotically optimal. The post-processing algorithm adds noise to make predictions differentially private. The optimal bound from low distance to calibration predictors from post-processing is non-optimal compared with existing online calibration algorithms that directly optimize for ECE and CDL.
Related papers
- Orthogonal Causal Calibration [55.28164682911196]
We prove generic upper bounds on the calibration error of any causal parameter estimate $theta$ with respect to any loss $ell$.
We use our bound to analyze the convergence of two sample splitting algorithms for causal calibration.
arXiv Detail & Related papers (2024-06-04T03:35:25Z) - Calibration Error for Decision Making [15.793486463552144]
We propose a decision-theoretic calibration error, the Decision Loss (CDL), defined as the maximum improvement in decision payoff obtained by the predictions.
We show separations between CDL and existing calibration error metrics, including the most well-studied metric Expected Error (ECE)
Our main technical contribution is a new efficient algorithm for online calibration that achieves near-optimal $O(fraclog TsqrtT)$ expected CDL.
arXiv Detail & Related papers (2024-04-21T01:53:20Z) - Calibration by Distribution Matching: Trainable Kernel Calibration
Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression.
These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization.
We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z) - When Does Optimizing a Proper Loss Yield Calibration? [12.684962113589515]
We show that any predictor with a local optimality satisfies smooth calibration.
We also show that the connection between local optimality and calibration error goes both ways.
arXiv Detail & Related papers (2023-05-30T05:53:34Z) - Calibration Error Estimation Using Fuzzy Binning [0.0]
We propose a Fuzzy Error metric (FCE) that utilizes a fuzzy binning approach to calculate calibration error.
Our results show that FCE offers better calibration error estimation, especially in multi-class settings.
arXiv Detail & Related papers (2023-04-30T18:06:14Z) - Revisiting Calibration for Question Answering [16.54743762235555]
We argue that the traditional evaluation of calibration does not reflect usefulness of the model confidence.
We propose a new calibration metric, MacroCE, that better captures whether the model assigns low confidence to wrong predictions and high confidence to correct predictions.
arXiv Detail & Related papers (2022-05-25T05:49:56Z) - T-Cal: An optimal test for the calibration of predictive models [49.11538724574202]
We consider detecting mis-calibration of predictive models using a finite validation dataset as a hypothesis testing problem.
detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently smooth functions of the predictions.
We propose T-Cal, a minimax test for calibration based on a de-biased plug-in estimator of the $ell$-Expected Error (ECE)
arXiv Detail & Related papers (2022-03-03T16:58:54Z) - Calibrating Predictions to Decisions: A Novel Approach to Multi-Class
Calibration [118.26862029820447]
We introduce a new notion -- emphdecision calibration -- that requires the predicted distribution and true distribution to be indistinguishable'' to a set of downstream decision-makers.
Decision calibration improves decision-making on skin lesions and ImageNet classification with modern neural network.
arXiv Detail & Related papers (2021-07-12T20:17:28Z) - Localized Calibration: Metrics and Recalibration [133.07044916594361]
We propose a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration.
We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods.
arXiv Detail & Related papers (2021-02-22T07:22:12Z) - Unsupervised Calibration under Covariate Shift [92.02278658443166]
We introduce the problem of calibration under domain shift and propose an importance sampling based approach to address it.
We evaluate and discuss the efficacy of our method on both real-world datasets and synthetic datasets.
arXiv Detail & Related papers (2020-06-29T21:50:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.