Calibration through the Lens of Interpretability
- URL: http://arxiv.org/abs/2412.00943v1
- Date: Sun, 01 Dec 2024 19:28:16 GMT
- Title: Calibration through the Lens of Interpretability
- Authors: Alireza Torabian, Ruth Urner,
- Abstract summary: calibration is a frequently invoked concept when useful label probability estimates are required on top of classification accuracy.
In this work, we initiate an axiomatic study of the notion of calibration.
We catalogue desirable properties of calibrated models as well as corresponding evaluation metrics and analyze their feasibility and correspondences.
- Score: 3.9962751777898955
- License:
- Abstract: Calibration is a frequently invoked concept when useful label probability estimates are required on top of classification accuracy. A calibrated model is a function whose values correctly reflect underlying label probabilities. Calibration in itself however does not imply classification accuracy, nor human interpretable estimates, nor is it straightforward to verify calibration from finite data. There is a plethora of evaluation metrics (and loss functions) that each assess a specific aspect of a calibration model. In this work, we initiate an axiomatic study of the notion of calibration. We catalogue desirable properties of calibrated models as well as corresponding evaluation metrics and analyze their feasibility and correspondences. We complement this analysis with an empirical evaluation, comparing common calibration methods to employing a simple, interpretable decision tree.
Related papers
- Understanding Model Calibration -- A gentle introduction and visual exploration of calibration and the expected calibration error (ECE) [0.0]
In this blogpost we'll take a look at the most commonly used definition for calibration.
We'll then cover some of the drawbacks of this measure and how these surfaced the need for additional notions of calibration.
arXiv Detail & Related papers (2025-01-31T11:18:45Z) - Consistent and Asymptotically Unbiased Estimation of Proper Calibration
Errors [23.819464242327257]
We propose a method that allows consistent estimation of all proper calibration errors and refinement terms.
We prove the relation between refinement and f-divergences, which implies information monotonicity in neural networks.
Our experiments validate the claimed properties of the proposed estimator and suggest that the selection of a post-hoc calibration method should be determined by the particular calibration error of interest.
arXiv Detail & Related papers (2023-12-14T01:20:08Z) - Calibration by Distribution Matching: Trainable Kernel Calibration
Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression.
These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization.
We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z) - Calibration of Neural Networks [77.34726150561087]
This paper presents a survey of confidence calibration problems in the context of neural networks.
We analyze problem statement, calibration definitions, and different approaches to evaluation.
Empirical experiments cover various datasets and models, comparing calibration methods according to different criteria.
arXiv Detail & Related papers (2023-03-19T20:27:51Z) - Better Uncertainty Calibration via Proper Scores for Classification and
Beyond [15.981380319863527]
We introduce the framework of proper calibration errors, which relates every calibration error to a proper score.
This relationship can be used to reliably quantify the model calibration improvement.
arXiv Detail & Related papers (2022-03-15T12:46:08Z) - T-Cal: An optimal test for the calibration of predictive models [49.11538724574202]
We consider detecting mis-calibration of predictive models using a finite validation dataset as a hypothesis testing problem.
detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently smooth functions of the predictions.
We propose T-Cal, a minimax test for calibration based on a de-biased plug-in estimator of the $ell$-Expected Error (ECE)
arXiv Detail & Related papers (2022-03-03T16:58:54Z) - Estimating Expected Calibration Errors [1.52292571922932]
Uncertainty in probabilistics predictions is a key concern when models are used to support human decision making.
Most models are not intrinsically well calibrated, meaning that their decision scores are not consistent with posterior probabilities.
We build an empirical procedure to quantify the quality of $ECE$ estimators, and use it to decide which estimator should be used in practice for different settings.
arXiv Detail & Related papers (2021-09-08T08:00:23Z) - Localized Calibration: Metrics and Recalibration [133.07044916594361]
We propose a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration.
We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods.
arXiv Detail & Related papers (2021-02-22T07:22:12Z) - Unsupervised Calibration under Covariate Shift [92.02278658443166]
We introduce the problem of calibration under domain shift and propose an importance sampling based approach to address it.
We evaluate and discuss the efficacy of our method on both real-world datasets and synthetic datasets.
arXiv Detail & Related papers (2020-06-29T21:50:07Z) - Calibration of Neural Networks using Splines [51.42640515410253]
Measuring calibration error amounts to comparing two empirical distributions.
We introduce a binning-free calibration measure inspired by the classical Kolmogorov-Smirnov (KS) statistical test.
Our method consistently outperforms existing methods on KS error as well as other commonly used calibration measures.
arXiv Detail & Related papers (2020-06-23T07:18:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.