Hidden Heterogeneity: When to Choose Similarity-Based Calibration
- URL: http://arxiv.org/abs/2202.01840v1
- Date: Thu, 3 Feb 2022 20:43:25 GMT
- Title: Hidden Heterogeneity: When to Choose Similarity-Based Calibration
- Authors: Kiri L. Wagstaff and Thomas G. Dietterich
- Abstract summary: Black-box calibration methods are unable to detect subpopulations where calibration could improve prediction accuracy.
The paper proposes a quantitative measure for hidden heterogeneity (HH)
Experiments show that the improvements in calibration achieved by similarity-based calibration methods correlate with the amount of HH present and, given sufficient calibration data, generally exceed calibration achieved by global methods.
- Score: 12.788224825185633
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Trustworthy classifiers are essential to the adoption of machine learning
predictions in many real-world settings. The predicted probability of possible
outcomes can inform high-stakes decision making, particularly when assessing
the expected value of alternative decisions or the risk of bad outcomes. These
decisions require well calibrated probabilities, not just the correct
prediction of the most likely class. Black-box classifier calibration methods
can improve the reliability of a classifier's output without requiring
retraining. However, these methods are unable to detect subpopulations where
calibration could improve prediction accuracy. Such subpopulations are said to
exhibit "hidden heterogeneity" (HH), because the original classifier did not
detect them. The paper proposes a quantitative measure for HH. It also
introduces two similarity-weighted calibration methods that can address HH by
adapting locally to each test item: SWC weights the calibration set by
similarity to the test item, and SWC-HH explicitly incorporates hidden
heterogeneity to filter the calibration set. Experiments show that the
improvements in calibration achieved by similarity-based calibration methods
correlate with the amount of HH present and, given sufficient calibration data,
generally exceed calibration achieved by global methods. HH can therefore serve
as a useful diagnostic tool for identifying when local calibration methods are
needed.
Related papers
- Towards Certification of Uncertainty Calibration under Adversarial Attacks [96.48317453951418]
We show that attacks can significantly harm calibration, and thus propose certified calibration as worst-case bounds on calibration under adversarial perturbations.
We propose novel calibration attacks and demonstrate how they can improve model calibration through textitadversarial calibration training
arXiv Detail & Related papers (2024-05-22T18:52:09Z) - Calibration by Distribution Matching: Trainable Kernel Calibration
Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression.
These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization.
We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z) - T-Cal: An optimal test for the calibration of predictive models [49.11538724574202]
We consider detecting mis-calibration of predictive models using a finite validation dataset as a hypothesis testing problem.
detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently smooth functions of the predictions.
We propose T-Cal, a minimax test for calibration based on a de-biased plug-in estimator of the $ell$-Expected Error (ECE)
arXiv Detail & Related papers (2022-03-03T16:58:54Z) - Localized Calibration: Metrics and Recalibration [133.07044916594361]
We propose a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration.
We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods.
arXiv Detail & Related papers (2021-02-22T07:22:12Z) - Unsupervised Calibration under Covariate Shift [92.02278658443166]
We introduce the problem of calibration under domain shift and propose an importance sampling based approach to address it.
We evaluate and discuss the efficacy of our method on both real-world datasets and synthetic datasets.
arXiv Detail & Related papers (2020-06-29T21:50:07Z) - Calibration of Neural Networks using Splines [51.42640515410253]
Measuring calibration error amounts to comparing two empirical distributions.
We introduce a binning-free calibration measure inspired by the classical Kolmogorov-Smirnov (KS) statistical test.
Our method consistently outperforms existing methods on KS error as well as other commonly used calibration measures.
arXiv Detail & Related papers (2020-06-23T07:18:05Z) - Mix-n-Match: Ensemble and Compositional Methods for Uncertainty
Calibration in Deep Learning [21.08664370117846]
We show how Mix-n-Match calibration strategies can help achieve remarkably better data-efficiency and expressive power.
We also reveal potential issues in standard evaluation practices.
Our approaches outperform state-of-the-art solutions on both the calibration as well as the evaluation tasks.
arXiv Detail & Related papers (2020-03-16T17:00:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.