Combining Ensembles and Data Augmentation can Harm your Calibration
- URL: http://arxiv.org/abs/2010.09875v2
- Date: Mon, 22 Mar 2021 19:55:32 GMT
- Title: Combining Ensembles and Data Augmentation can Harm your Calibration
- Authors: Yeming Wen, Ghassen Jerfel, Rafael Muller, Michael W. Dusenberry,
Jasper Snoek, Balaji Lakshminarayanan, Dustin Tran
- Abstract summary: We show a surprising pathology: combining ensembles and data augmentation can harm model calibration.
We propose a simple correction, achieving the best of both worlds with significant accuracy and calibration gains over using only ensembles or data augmentation individually.
- Score: 33.94335246681807
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ensemble methods which average over multiple neural network predictions are a
simple approach to improve a model's calibration and robustness. Similarly,
data augmentation techniques, which encode prior information in the form of
invariant feature transformations, are effective for improving calibration and
robustness. In this paper, we show a surprising pathology: combining ensembles
and data augmentation can harm model calibration. This leads to a trade-off in
practice, whereby improved accuracy by combining the two techniques comes at
the expense of calibration. On the other hand, selecting only one of the
techniques ensures good uncertainty estimates at the expense of accuracy. We
investigate this pathology and identify a compounding under-confidence among
methods which marginalize over sets of weights and data augmentation techniques
which soften labels. Finally, we propose a simple correction, achieving the
best of both worlds with significant accuracy and calibration gains over using
only ensembles or data augmentation individually. Applying the correction
produces new state-of-the art in uncertainty calibration across CIFAR-10,
CIFAR-100, and ImageNet.
Related papers
- Feature Clipping for Uncertainty Calibration [24.465567005078135]
Modern deep neural networks (DNNs) often suffer from overconfidence, leading to miscalibration.
We propose a novel post-hoc calibration method called feature clipping (FC) to address this issue.
FC involves clipping feature values to a specified threshold, effectively increasing entropy in high calibration error samples.
arXiv Detail & Related papers (2024-10-16T06:44:35Z) - Fill In The Gaps: Model Calibration and Generalization with Synthetic Data [2.89287673224661]
We propose a calibration method that incorporates synthetic data without compromising accuracy.
We derive the expected calibration error (ECE) bound using the Probably Approximately Correct (PAC) learning framework.
We observed an average up to 34% increase in accuracy and 33% decrease in ECE.
arXiv Detail & Related papers (2024-10-07T23:06:42Z) - On Calibrating Semantic Segmentation Models: Analyses and An Algorithm [51.85289816613351]
We study the problem of semantic segmentation calibration.
Model capacity, crop size, multi-scale testing, and prediction correctness have impact on calibration.
We propose a simple, unifying, and effective approach, namely selective scaling.
arXiv Detail & Related papers (2022-12-22T22:05:16Z) - On the Importance of Calibration in Semi-supervised Learning [13.859032326378188]
State-of-the-art (SOTA) semi-supervised learning (SSL) methods have been highly successful in leveraging a mix of labeled and unlabeled data.
We introduce a family of new SSL models that optimize for calibration and demonstrate their effectiveness across standard vision benchmarks.
arXiv Detail & Related papers (2022-10-10T15:41:44Z) - Sample-dependent Adaptive Temperature Scaling for Improved Calibration [95.7477042886242]
Post-hoc approach to compensate for neural networks being wrong is to perform temperature scaling.
We propose to predict a different temperature value for each input, allowing us to adjust the mismatch between confidence and accuracy.
We test our method on the ResNet50 and WideResNet28-10 architectures using the CIFAR10/100 and Tiny-ImageNet datasets.
arXiv Detail & Related papers (2022-07-13T14:13:49Z) - Improved Predictive Uncertainty using Corruption-based Calibration [64.49386167517582]
We propose a simple post hoc calibration method to estimate the confidence/uncertainty that a model prediction is correct on data.
We achieve this by synthesizing surrogate calibration sets by corrupting the calibration set with varying intensities of a known corruption.
arXiv Detail & Related papers (2021-06-07T16:27:18Z) - Parameterized Temperature Scaling for Boosting the Expressive Power in
Post-Hoc Uncertainty Calibration [57.568461777747515]
We introduce a novel calibration method, Parametrized Temperature Scaling (PTS)
We demonstrate that the performance of accuracy-preserving state-of-the-art post-hoc calibrators is limited by their intrinsic expressive power.
We show with extensive experiments that our novel accuracy-preserving approach consistently outperforms existing algorithms across a large number of model architectures, datasets and metrics.
arXiv Detail & Related papers (2021-02-24T10:18:30Z) - Uncertainty Quantification and Deep Ensembles [79.4957965474334]
We show that deep-ensembles do not necessarily lead to improved calibration properties.
We show that standard ensembling methods, when used in conjunction with modern techniques such as mixup regularization, can lead to less calibrated models.
This text examines the interplay between three of the most simple and commonly used approaches to leverage deep learning when data is scarce.
arXiv Detail & Related papers (2020-07-17T07:32:24Z) - Diverse Ensembles Improve Calibration [14.678791405731486]
We propose a simple technique to improve calibration, using a different data augmentation for each ensemble member.
We additionally use the idea of mixing' un-augmented and augmented inputs to improve calibration when test and training distributions are the same.
arXiv Detail & Related papers (2020-07-08T15:48:12Z) - Unsupervised Calibration under Covariate Shift [92.02278658443166]
We introduce the problem of calibration under domain shift and propose an importance sampling based approach to address it.
We evaluate and discuss the efficacy of our method on both real-world datasets and synthetic datasets.
arXiv Detail & Related papers (2020-06-29T21:50:07Z) - Mix-n-Match: Ensemble and Compositional Methods for Uncertainty
Calibration in Deep Learning [21.08664370117846]
We show how Mix-n-Match calibration strategies can help achieve remarkably better data-efficiency and expressive power.
We also reveal potential issues in standard evaluation practices.
Our approaches outperform state-of-the-art solutions on both the calibration as well as the evaluation tasks.
arXiv Detail & Related papers (2020-03-16T17:00:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.