Related papers: Should Ensemble Members Be Calibrated?

Should Ensemble Members Be Calibrated?

URL: http://arxiv.org/abs/2101.05397v1
Date: Wed, 13 Jan 2021 23:59:00 GMT
Title: Should Ensemble Members Be Calibrated?
Authors: Xixin Wu and Mark Gales
Abstract summary: Modern deep neural networks are often observed to be poorly calibrated. Deep learning approaches make use of large numbers of model parameters. This paper explores the application of calibration schemes to deep ensembles.
Score: 16.331175260764
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Underlying the use of statistical approaches for a wide range of applications is the assumption that the probabilities obtained from a statistical model are representative of the "true" probability that event, or outcome, will occur. Unfortunately, for modern deep neural networks this is not the case, they are often observed to be poorly calibrated. Additionally, these deep learning approaches make use of large numbers of model parameters, motivating the use of Bayesian, or ensemble approximation, approaches to handle issues with parameter estimation. This paper explores the application of calibration schemes to deep ensembles from both a theoretical perspective and empirically on a standard image classification task, CIFAR-100. The underlying theoretical requirements for calibration, and associated calibration criteria, are first described. It is shown that well calibrated ensemble members will not necessarily yield a well calibrated ensemble prediction, and if the ensemble prediction is well calibrated its performance cannot exceed that of the average performance of the calibrated ensemble members. On CIFAR-100 the impact of calibration for ensemble prediction, and associated calibration is evaluated. Additionally the situation where multiple different topologies are combined together is discussed.

Related papers

A calibration test for evaluating set-based epistemic uncertainty representations [25.768233719182742]
We propose a novel statistical test to determine whether there is a convex combination of the set's predictions that is calibrated in distribution. In contrast to previous methods, our framework allows the convex combination to be instance dependent, recognizing that different ensemble members may be better calibrated in different regions of the input space.
arXiv Detail & Related papers (2025-02-22T17:10:45Z)
Reassessing How to Compare and Improve the Calibration of Machine Learning Models [7.183341902583164]
A machine learning model is calibrated if its predicted probability for an outcome matches the observed frequency for that outcome conditional on the model prediction. We show that there exist trivial recalibration approaches that can appear seemingly state-of-the-art unless calibration and prediction metrics are accompanied by additional generalization metrics.
arXiv Detail & Related papers (2024-06-06T13:33:45Z)
Orthogonal Causal Calibration [55.28164682911196]
We develop general algorithms for reducing the task of causal calibration to that of calibrating a standard (non-causal) predictive model. Our results are exceedingly general, showing that essentially any existing calibration algorithm can be used in causal settings.
arXiv Detail & Related papers (2024-06-04T03:35:25Z)
Calibration by Distribution Matching: Trainable Kernel Calibration Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression. These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization. We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z)
Calibration of Neural Networks [77.34726150561087]
This paper presents a survey of confidence calibration problems in the context of neural networks. We analyze problem statement, calibration definitions, and different approaches to evaluation. Empirical experiments cover various datasets and models, comparing calibration methods according to different criteria.
arXiv Detail & Related papers (2023-03-19T20:27:51Z)
Fair admission risk prediction with proportional multicalibration [0.16249424686052708]
Multicalibration constrains calibration error among flexibly-defined subpopulations. It is possible for a decision-maker to learn to trust or distrust model predictions for specific groups. We propose proportional multicalibration, a criteria that constrains the percent calibration error among groups and within prediction bins.
arXiv Detail & Related papers (2022-09-29T08:15:29Z)
Better Uncertainty Calibration via Proper Scores for Classification and Beyond [15.981380319863527]
We introduce the framework of proper calibration errors, which relates every calibration error to a proper score. This relationship can be used to reliably quantify the model calibration improvement.
arXiv Detail & Related papers (2022-03-15T12:46:08Z)
T-Cal: An optimal test for the calibration of predictive models [49.11538724574202]
We consider detecting mis-calibration of predictive models using a finite validation dataset as a hypothesis testing problem. detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently smooth functions of the predictions. We propose T-Cal, a minimax test for calibration based on a de-biased plug-in estimator of the $ell$-Expected Error (ECE)
arXiv Detail & Related papers (2022-03-03T16:58:54Z)
Obtaining Calibrated Probabilities with Personalized Ranking Models [16.883188358641398]
We estimate the calibrated probability of how likely a user will prefer an item. We propose two parametric calibration methods, namely Gaussian calibration and Gamma calibration. We design the unbiased empirical risk minimization framework that guides the calibration methods.
arXiv Detail & Related papers (2021-12-09T11:08:41Z)
Estimating Expected Calibration Errors [1.52292571922932]
Uncertainty in probabilistics predictions is a key concern when models are used to support human decision making. Most models are not intrinsically well calibrated, meaning that their decision scores are not consistent with posterior probabilities. We build an empirical procedure to quantify the quality of $ECE$ estimators, and use it to decide which estimator should be used in practice for different settings.
arXiv Detail & Related papers (2021-09-08T08:00:23Z)
Localized Calibration: Metrics and Recalibration [133.07044916594361]
We propose a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration. We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods.
arXiv Detail & Related papers (2021-02-22T07:22:12Z)
Unsupervised Calibration under Covariate Shift [92.02278658443166]
We introduce the problem of calibration under domain shift and propose an importance sampling based approach to address it. We evaluate and discuss the efficacy of our method on both real-world datasets and synthetic datasets.
arXiv Detail & Related papers (2020-06-29T21:50:07Z)
Calibration of Neural Networks using Splines [51.42640515410253]
Measuring calibration error amounts to comparing two empirical distributions. We introduce a binning-free calibration measure inspired by the classical Kolmogorov-Smirnov (KS) statistical test. Our method consistently outperforms existing methods on KS error as well as other commonly used calibration measures.
arXiv Detail & Related papers (2020-06-23T07:18:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.