Uncertainty Quantification and Deep Ensembles
- URL: http://arxiv.org/abs/2007.08792v4
- Date: Tue, 2 Nov 2021 11:12:26 GMT
- Title: Uncertainty Quantification and Deep Ensembles
- Authors: Rahul Rahaman and Alexandre H. Thiery
- Abstract summary: We show that deep-ensembles do not necessarily lead to improved calibration properties.
We show that standard ensembling methods, when used in conjunction with modern techniques such as mixup regularization, can lead to less calibrated models.
This text examines the interplay between three of the most simple and commonly used approaches to leverage deep learning when data is scarce.
- Score: 79.4957965474334
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Learning methods are known to suffer from calibration issues: they
typically produce over-confident estimates. These problems are exacerbated in
the low data regime. Although the calibration of probabilistic models is well
studied, calibrating extremely over-parametrized models in the low-data regime
presents unique challenges. We show that deep-ensembles do not necessarily lead
to improved calibration properties. In fact, we show that standard ensembling
methods, when used in conjunction with modern techniques such as mixup
regularization, can lead to less calibrated models. This text examines the
interplay between three of the most simple and commonly used approaches to
leverage deep learning when data is scarce: data-augmentation, ensembling, and
post-processing calibration methods. Although standard ensembling techniques
certainly help boost accuracy, we demonstrate that the calibration of deep
ensembles relies on subtle trade-offs. We also find that calibration methods
such as temperature scaling need to be slightly tweaked when used with
deep-ensembles and, crucially, need to be executed after the averaging process.
Our simulations indicate that this simple strategy can halve the Expected
Calibration Error (ECE) on a range of benchmark classification problems
compared to standard deep-ensembles in the low data regime.
Related papers
- Calibration in Deep Learning: A Survey of the State-of-the-Art [7.6087138685470945]
Calibrating deep neural models plays an important role in building reliable, robust AI systems in safety-critical applications.
Recent work has shown that modern neural networks that possess high predictive capability are poorly calibrated and produce unreliable model predictions.
arXiv Detail & Related papers (2023-08-02T15:28:10Z) - Set Learning for Accurate and Calibrated Models [17.187117466317265]
Odd-$k$-out learning minimizes the cross-entropy error for sets rather than for single examples.
OKO often yields better calibration even when training with hard labels and dropping any additional calibration parameter tuning.
arXiv Detail & Related papers (2023-07-05T12:39:58Z) - Calibration of Neural Networks [77.34726150561087]
This paper presents a survey of confidence calibration problems in the context of neural networks.
We analyze problem statement, calibration definitions, and different approaches to evaluation.
Empirical experiments cover various datasets and models, comparing calibration methods according to different criteria.
arXiv Detail & Related papers (2023-03-19T20:27:51Z) - The Calibration Generalization Gap [15.583540869583484]
Modern neural networks provide no strong guarantees on their calibration.
It is currently unclear which factors contribute to good calibration.
We propose a systematic way to study the calibration error.
arXiv Detail & Related papers (2022-10-05T00:04:56Z) - On the Dark Side of Calibration for Modern Neural Networks [65.83956184145477]
We show the breakdown of expected calibration error (ECE) into predicted confidence and refinement.
We highlight that regularisation based calibration only focuses on naively reducing a model's confidence.
We find that many calibration approaches with the likes of label smoothing, mixup etc. lower the utility of a DNN by degrading its refinement.
arXiv Detail & Related papers (2021-06-17T11:04:14Z) - Parameterized Temperature Scaling for Boosting the Expressive Power in
Post-Hoc Uncertainty Calibration [57.568461777747515]
We introduce a novel calibration method, Parametrized Temperature Scaling (PTS)
We demonstrate that the performance of accuracy-preserving state-of-the-art post-hoc calibrators is limited by their intrinsic expressive power.
We show with extensive experiments that our novel accuracy-preserving approach consistently outperforms existing algorithms across a large number of model architectures, datasets and metrics.
arXiv Detail & Related papers (2021-02-24T10:18:30Z) - Localized Calibration: Metrics and Recalibration [133.07044916594361]
We propose a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration.
We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods.
arXiv Detail & Related papers (2021-02-22T07:22:12Z) - When and How Mixup Improves Calibration [19.11486078732542]
In many machine learning applications, it is important for the model to provide confidence scores that accurately captures its prediction uncertainty.
In this paper, we theoretically prove that Mixup improves calibration in textithigh-dimensional settings by investigating two natural data models.
While incorporating unlabeled data can sometimes make the model less calibrated, adding Mixup trainings this issue and provably improves calibration.
arXiv Detail & Related papers (2021-02-11T22:24:54Z) - Unsupervised Calibration under Covariate Shift [92.02278658443166]
We introduce the problem of calibration under domain shift and propose an importance sampling based approach to address it.
We evaluate and discuss the efficacy of our method on both real-world datasets and synthetic datasets.
arXiv Detail & Related papers (2020-06-29T21:50:07Z) - Mix-n-Match: Ensemble and Compositional Methods for Uncertainty
Calibration in Deep Learning [21.08664370117846]
We show how Mix-n-Match calibration strategies can help achieve remarkably better data-efficiency and expressive power.
We also reveal potential issues in standard evaluation practices.
Our approaches outperform state-of-the-art solutions on both the calibration as well as the evaluation tasks.
arXiv Detail & Related papers (2020-03-16T17:00:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.