Estimating Expected Calibration Errors
- URL: http://arxiv.org/abs/2109.03480v1
- Date: Wed, 8 Sep 2021 08:00:23 GMT
- Title: Estimating Expected Calibration Errors
- Authors: Nicolas Posocco, Antoine Bonnefoy
- Abstract summary: Uncertainty in probabilistics predictions is a key concern when models are used to support human decision making.
Most models are not intrinsically well calibrated, meaning that their decision scores are not consistent with posterior probabilities.
We build an empirical procedure to quantify the quality of $ECE$ estimators, and use it to decide which estimator should be used in practice for different settings.
- Score: 1.52292571922932
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Uncertainty in probabilistic classifiers predictions is a key concern when
models are used to support human decision making, in broader probabilistic
pipelines or when sensitive automatic decisions have to be taken. Studies have
shown that most models are not intrinsically well calibrated, meaning that
their decision scores are not consistent with posterior probabilities. Hence
being able to calibrate these models, or enforce calibration while learning
them, has regained interest in recent literature. In this context, properly
assessing calibration is paramount to quantify new contributions tackling
calibration. However, there is room for improvement for commonly used metrics
and evaluation of calibration could benefit from deeper analyses. Thus this
paper focuses on the empirical evaluation of calibration metrics in the context
of classification. More specifically it evaluates different estimators of the
Expected Calibration Error ($ECE$), amongst which legacy estimators and some
novel ones, proposed in this paper. We build an empirical procedure to quantify
the quality of these $ECE$ estimators, and use it to decide which estimator
should be used in practice for different settings.
Related papers
- Optimizing Estimators of Squared Calibration Errors in Classification [2.3020018305241337]
We propose a mean-squared error-based risk that enables the comparison and optimization of estimators of squared calibration errors.
Our approach advocates for a training-validation-testing pipeline when estimating a calibration error.
arXiv Detail & Related papers (2024-10-09T15:58:06Z) - Reassessing How to Compare and Improve the Calibration of Machine Learning Models [7.183341902583164]
A machine learning model is calibrated if its predicted probability for an outcome matches the observed frequency for that outcome conditional on the model prediction.
We show that there exist trivial recalibration approaches that can appear seemingly state-of-the-art unless calibration and prediction metrics are accompanied by additional generalization metrics.
arXiv Detail & Related papers (2024-06-06T13:33:45Z) - Towards Certification of Uncertainty Calibration under Adversarial Attacks [96.48317453951418]
We show that attacks can significantly harm calibration, and thus propose certified calibration as worst-case bounds on calibration under adversarial perturbations.
We propose novel calibration attacks and demonstrate how they can improve model calibration through textitadversarial calibration training
arXiv Detail & Related papers (2024-05-22T18:52:09Z) - From Uncertainty to Precision: Enhancing Binary Classifier Performance
through Calibration [0.3495246564946556]
Given that model-predicted scores are commonly seen as event probabilities, calibration is crucial for accurate interpretation.
We analyze the sensitivity of various calibration measures to score distortions and introduce a refined metric, the Local Score.
We apply these findings in a real-world scenario using Random Forest classifier and regressor to predict credit default while simultaneously measuring calibration.
arXiv Detail & Related papers (2024-02-12T16:55:19Z) - Calibration by Distribution Matching: Trainable Kernel Calibration
Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression.
These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization.
We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z) - Calibration of Neural Networks [77.34726150561087]
This paper presents a survey of confidence calibration problems in the context of neural networks.
We analyze problem statement, calibration definitions, and different approaches to evaluation.
Empirical experiments cover various datasets and models, comparing calibration methods according to different criteria.
arXiv Detail & Related papers (2023-03-19T20:27:51Z) - Better Uncertainty Calibration via Proper Scores for Classification and
Beyond [15.981380319863527]
We introduce the framework of proper calibration errors, which relates every calibration error to a proper score.
This relationship can be used to reliably quantify the model calibration improvement.
arXiv Detail & Related papers (2022-03-15T12:46:08Z) - T-Cal: An optimal test for the calibration of predictive models [49.11538724574202]
We consider detecting mis-calibration of predictive models using a finite validation dataset as a hypothesis testing problem.
detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently smooth functions of the predictions.
We propose T-Cal, a minimax test for calibration based on a de-biased plug-in estimator of the $ell$-Expected Error (ECE)
arXiv Detail & Related papers (2022-03-03T16:58:54Z) - Localized Calibration: Metrics and Recalibration [133.07044916594361]
We propose a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration.
We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods.
arXiv Detail & Related papers (2021-02-22T07:22:12Z) - Unsupervised Calibration under Covariate Shift [92.02278658443166]
We introduce the problem of calibration under domain shift and propose an importance sampling based approach to address it.
We evaluate and discuss the efficacy of our method on both real-world datasets and synthetic datasets.
arXiv Detail & Related papers (2020-06-29T21:50:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.