Boldness-Recalibration for Binary Event Predictions
- URL: http://arxiv.org/abs/2305.03780v3
- Date: Wed, 31 Jan 2024 19:50:48 GMT
- Title: Boldness-Recalibration for Binary Event Predictions
- Authors: Adeline P. Guthrie and Christopher T. Franck
- Abstract summary: Ideally, probability predictions are (i) well calibrated, (ii) accurate, and (iii) bold, i.e., spread out enough to be informative for decision making.
There is a fundamental tension between calibration and boldness, since calibration metrics can be high when predictions are overly cautious, i.e., non-bold.
The purpose of this work is to develop a Bayesian model selection-based approach to assess calibration, and a strategy for boldness-recalibration.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Probability predictions are essential to inform decision making across many
fields. Ideally, probability predictions are (i) well calibrated, (ii)
accurate, and (iii) bold, i.e., spread out enough to be informative for
decision making. However, there is a fundamental tension between calibration
and boldness, since calibration metrics can be high when predictions are overly
cautious, i.e., non-bold. The purpose of this work is to develop a Bayesian
model selection-based approach to assess calibration, and a strategy for
boldness-recalibration that enables practitioners to responsibly embolden
predictions subject to their required level of calibration. Specifically, we
allow the user to pre-specify their desired posterior probability of
calibration, then maximally embolden predictions subject to this constraint. We
demonstrate the method with a case study on hockey home team win probabilities
and then verify the performance of our procedures via simulation. We find that
very slight relaxation of calibration probability (e.g., from 0.99 to 0.95) can
often substantially embolden predictions when they are well calibrated and
accurate (e.g., widening hockey predictions range from .26-.78 to .10-.91).
Related papers
- Truthfulness of Calibration Measures [18.21682539787221]
A calibration measure is said to be truthful if the forecaster minimizes expected penalty by predicting the conditional expectation of the next outcome.
This makes it an essential desideratum for calibration measures, alongside typical requirements, such as soundness and completeness.
We introduce a new calibration measure termed the Subsampled Smooth Error (SSCE) under which truthful prediction is optimal up to a constant multiplicative factor.
arXiv Detail & Related papers (2024-07-19T02:07:55Z) - Towards Certification of Uncertainty Calibration under Adversarial Attacks [96.48317453951418]
We show that attacks can significantly harm calibration, and thus propose certified calibration as worst-case bounds on calibration under adversarial perturbations.
We propose novel calibration attacks and demonstrate how they can improve model calibration through textitadversarial calibration training
arXiv Detail & Related papers (2024-05-22T18:52:09Z) - Optimizing Calibration by Gaining Aware of Prediction Correctness [30.619608580138802]
Cross-Entropy (CE) loss is widely used for calibrator training, which enforces the model to increase confidence on the ground truth class.
We propose a new post-hoc calibration objective derived from the aim of calibration.
arXiv Detail & Related papers (2024-04-19T17:25:43Z) - Calibrated Uncertainty Quantification for Operator Learning via
Conformal Prediction [95.75771195913046]
We propose a risk-controlling quantile neural operator, a distribution-free, finite-sample functional calibration conformal prediction method.
We provide a theoretical calibration guarantee on the coverage rate, defined as the expected percentage of points on the function domain.
Empirical results on a 2D Darcy flow and a 3D car surface pressure prediction task validate our theoretical results.
arXiv Detail & Related papers (2024-02-02T23:43:28Z) - Calibration by Distribution Matching: Trainable Kernel Calibration
Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression.
These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization.
We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z) - Calibrated Selective Classification [34.08454890436067]
We develop a new approach to selective classification in which we propose a method for rejecting examples with "uncertain" uncertainties.
We present a framework for learning selectively calibrated models, where a separate selector network is trained to improve the selective calibration error of a given base model.
We demonstrate the empirical effectiveness of our approach on multiple image classification and lung cancer risk assessment tasks.
arXiv Detail & Related papers (2022-08-25T13:31:09Z) - Posterior Probability Matters: Doubly-Adaptive Calibration for Neural Predictions in Online Advertising [29.80454356173723]
Field-level calibration is fine-grained and more practical.
AdaCalib learns an isotonic function family to calibrate model predictions.
Experiments verify that AdaCalib achieves significant improvement on calibration performance.
arXiv Detail & Related papers (2022-05-15T14:27:19Z) - T-Cal: An optimal test for the calibration of predictive models [49.11538724574202]
We consider detecting mis-calibration of predictive models using a finite validation dataset as a hypothesis testing problem.
detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently smooth functions of the predictions.
We propose T-Cal, a minimax test for calibration based on a de-biased plug-in estimator of the $ell$-Expected Error (ECE)
arXiv Detail & Related papers (2022-03-03T16:58:54Z) - Unsupervised Calibration under Covariate Shift [92.02278658443166]
We introduce the problem of calibration under domain shift and propose an importance sampling based approach to address it.
We evaluate and discuss the efficacy of our method on both real-world datasets and synthetic datasets.
arXiv Detail & Related papers (2020-06-29T21:50:07Z) - Individual Calibration with Randomized Forecasting [116.2086707626651]
We show that calibration for individual samples is possible in the regression setup if the predictions are randomized.
We design a training objective to enforce individual calibration and use it to train randomized regression functions.
arXiv Detail & Related papers (2020-06-18T05:53:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.