From Uncertainty to Precision: Enhancing Binary Classifier Performance
through Calibration
- URL: http://arxiv.org/abs/2402.07790v1
- Date: Mon, 12 Feb 2024 16:55:19 GMT
- Title: From Uncertainty to Precision: Enhancing Binary Classifier Performance
through Calibration
- Authors: Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire, Ewen
Gallic, Fran\c{c}ois Hu
- Abstract summary: Given that model-predicted scores are commonly seen as event probabilities, calibration is crucial for accurate interpretation.
We analyze the sensitivity of various calibration measures to score distortions and introduce a refined metric, the Local Score.
We apply these findings in a real-world scenario using Random Forest classifier and regressor to predict credit default while simultaneously measuring calibration.
- Score: 0.3495246564946556
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The assessment of binary classifier performance traditionally centers on
discriminative ability using metrics, such as accuracy. However, these metrics
often disregard the model's inherent uncertainty, especially when dealing with
sensitive decision-making domains, such as finance or healthcare. Given that
model-predicted scores are commonly seen as event probabilities, calibration is
crucial for accurate interpretation. In our study, we analyze the sensitivity
of various calibration measures to score distortions and introduce a refined
metric, the Local Calibration Score. Comparing recalibration methods, we
advocate for local regressions, emphasizing their dual role as effective
recalibration tools and facilitators of smoother visualizations. We apply these
findings in a real-world scenario using Random Forest classifier and regressor
to predict credit default while simultaneously measuring calibration during
performance optimization.
Related papers
- Optimizing Estimators of Squared Calibration Errors in Classification [2.3020018305241337]
We propose a mean-squared error-based risk that enables the comparison and optimization of estimators of squared calibration errors.
Our approach advocates for a training-validation-testing pipeline when estimating a calibration error.
arXiv Detail & Related papers (2024-10-09T15:58:06Z) - Towards Certification of Uncertainty Calibration under Adversarial Attacks [96.48317453951418]
We show that attacks can significantly harm calibration, and thus propose certified calibration as worst-case bounds on calibration under adversarial perturbations.
We propose novel calibration attacks and demonstrate how they can improve model calibration through textitadversarial calibration training
arXiv Detail & Related papers (2024-05-22T18:52:09Z) - Calibration by Distribution Matching: Trainable Kernel Calibration
Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression.
These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization.
We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z) - On Calibrating Semantic Segmentation Models: Analyses and An Algorithm [51.85289816613351]
We study the problem of semantic segmentation calibration.
Model capacity, crop size, multi-scale testing, and prediction correctness have impact on calibration.
We propose a simple, unifying, and effective approach, namely selective scaling.
arXiv Detail & Related papers (2022-12-22T22:05:16Z) - What is Your Metric Telling You? Evaluating Classifier Calibration under
Context-Specific Definitions of Reliability [6.510061176722249]
We argue that more expressive metrics must be developed that accurately measure calibration error.
We use a generalization of Expected Error (ECE) that measure calibration error under different definitions of reliability.
We find that: 1) definitions ECE that focus solely on the predicted class fail to accurately measure calibration error under a selection of practically useful definitions of reliability and 2) many common calibration techniques fail to improve calibration performance uniformly across ECE metrics.
arXiv Detail & Related papers (2022-05-23T16:45:02Z) - Better Uncertainty Calibration via Proper Scores for Classification and
Beyond [15.981380319863527]
We introduce the framework of proper calibration errors, which relates every calibration error to a proper score.
This relationship can be used to reliably quantify the model calibration improvement.
arXiv Detail & Related papers (2022-03-15T12:46:08Z) - Estimating Expected Calibration Errors [1.52292571922932]
Uncertainty in probabilistics predictions is a key concern when models are used to support human decision making.
Most models are not intrinsically well calibrated, meaning that their decision scores are not consistent with posterior probabilities.
We build an empirical procedure to quantify the quality of $ECE$ estimators, and use it to decide which estimator should be used in practice for different settings.
arXiv Detail & Related papers (2021-09-08T08:00:23Z) - Localized Calibration: Metrics and Recalibration [133.07044916594361]
We propose a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration.
We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods.
arXiv Detail & Related papers (2021-02-22T07:22:12Z) - Unsupervised Calibration under Covariate Shift [92.02278658443166]
We introduce the problem of calibration under domain shift and propose an importance sampling based approach to address it.
We evaluate and discuss the efficacy of our method on both real-world datasets and synthetic datasets.
arXiv Detail & Related papers (2020-06-29T21:50:07Z) - Calibration of Neural Networks using Splines [51.42640515410253]
Measuring calibration error amounts to comparing two empirical distributions.
We introduce a binning-free calibration measure inspired by the classical Kolmogorov-Smirnov (KS) statistical test.
Our method consistently outperforms existing methods on KS error as well as other commonly used calibration measures.
arXiv Detail & Related papers (2020-06-23T07:18:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.