The Calibration Generalization Gap
- URL: http://arxiv.org/abs/2210.01964v2
- Date: Thu, 6 Oct 2022 04:21:24 GMT
- Title: The Calibration Generalization Gap
- Authors: A. Michael Carrell, Neil Mallinar, James Lucas, Preetum Nakkiran
- Abstract summary: Modern neural networks provide no strong guarantees on their calibration.
It is currently unclear which factors contribute to good calibration.
We propose a systematic way to study the calibration error.
- Score: 15.583540869583484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Calibration is a fundamental property of a good predictive model: it requires
that the model predicts correctly in proportion to its confidence. Modern
neural networks, however, provide no strong guarantees on their calibration --
and can be either poorly calibrated or well-calibrated depending on the
setting. It is currently unclear which factors contribute to good calibration
(architecture, data augmentation, overparameterization, etc), though various
claims exist in the literature.
We propose a systematic way to study the calibration error: by decomposing it
into (1) calibration error on the train set, and (2) the calibration
generalization gap. This mirrors the fundamental decomposition of
generalization. We then investigate each of these terms, and give empirical
evidence that (1) DNNs are typically always calibrated on their train set, and
(2) the calibration generalization gap is upper-bounded by the standard
generalization gap. Taken together, this implies that models with small
generalization gap (|Test Error - Train Error|) are well-calibrated. This
perspective unifies many results in the literature, and suggests that
interventions which reduce the generalization gap (such as adding data, using
heavy augmentation, or smaller model size) also improve calibration. We thus
hope our initial study lays the groundwork for a more systematic and
comprehensive understanding of the relation between calibration,
generalization, and optimization.
Related papers
- Reassessing How to Compare and Improve the Calibration of Machine Learning Models [7.183341902583164]
A machine learning model is calibrated if its predicted probability for an outcome matches the observed frequency for that outcome conditional on the model prediction.
We show that there exist trivial recalibration approaches that can appear seemingly state-of-the-art unless calibration and prediction metrics are accompanied by additional generalization metrics.
arXiv Detail & Related papers (2024-06-06T13:33:45Z) - Orthogonal Causal Calibration [55.28164682911196]
We prove generic upper bounds on the calibration error of any causal parameter estimate $theta$ with respect to any loss $ell$.
We use our bound to analyze the convergence of two sample splitting algorithms for causal calibration.
arXiv Detail & Related papers (2024-06-04T03:35:25Z) - Consistent and Asymptotically Unbiased Estimation of Proper Calibration
Errors [23.819464242327257]
We propose a method that allows consistent estimation of all proper calibration errors and refinement terms.
We prove the relation between refinement and f-divergences, which implies information monotonicity in neural networks.
Our experiments validate the claimed properties of the proposed estimator and suggest that the selection of a post-hoc calibration method should be determined by the particular calibration error of interest.
arXiv Detail & Related papers (2023-12-14T01:20:08Z) - Calibration by Distribution Matching: Trainable Kernel Calibration
Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression.
These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization.
We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z) - Class-wise and reduced calibration methods [0.0]
We show how a reduced calibration method transforms the original problem into a simpler one.
Second, we propose class-wise calibration methods, based on building on a phenomenon called neural collapse.
Applying the two methods together results in class-wise reduced calibration algorithms, which are powerful tools for reducing the prediction and per-class calibration errors.
arXiv Detail & Related papers (2022-10-07T17:13:17Z) - On the Dark Side of Calibration for Modern Neural Networks [65.83956184145477]
We show the breakdown of expected calibration error (ECE) into predicted confidence and refinement.
We highlight that regularisation based calibration only focuses on naively reducing a model's confidence.
We find that many calibration approaches with the likes of label smoothing, mixup etc. lower the utility of a DNN by degrading its refinement.
arXiv Detail & Related papers (2021-06-17T11:04:14Z) - Localized Calibration: Metrics and Recalibration [133.07044916594361]
We propose a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration.
We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods.
arXiv Detail & Related papers (2021-02-22T07:22:12Z) - Uncertainty Quantification and Deep Ensembles [79.4957965474334]
We show that deep-ensembles do not necessarily lead to improved calibration properties.
We show that standard ensembling methods, when used in conjunction with modern techniques such as mixup regularization, can lead to less calibrated models.
This text examines the interplay between three of the most simple and commonly used approaches to leverage deep learning when data is scarce.
arXiv Detail & Related papers (2020-07-17T07:32:24Z) - Unsupervised Calibration under Covariate Shift [92.02278658443166]
We introduce the problem of calibration under domain shift and propose an importance sampling based approach to address it.
We evaluate and discuss the efficacy of our method on both real-world datasets and synthetic datasets.
arXiv Detail & Related papers (2020-06-29T21:50:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.