Related papers: Diverse Ensembles Improve Calibration

Diverse Ensembles Improve Calibration

URL: http://arxiv.org/abs/2007.04206v1
Date: Wed, 8 Jul 2020 15:48:12 GMT
Title: Diverse Ensembles Improve Calibration
Authors: Asa Cooper Stickland and Iain Murray
Abstract summary: We propose a simple technique to improve calibration, using a different data augmentation for each ensemble member. We additionally use the idea of mixing' un-augmented and augmented inputs to improve calibration when test and training distributions are the same.
Score: 14.678791405731486
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern deep neural networks can produce badly calibrated predictions, especially when train and test distributions are mismatched. Training an ensemble of models and averaging their predictions can help alleviate these issues. We propose a simple technique to improve calibration, using a different data augmentation for each ensemble member. We additionally use the idea of `mixing' un-augmented and augmented inputs to improve calibration when test and training distributions are the same. These simple techniques improve calibration and accuracy over strong baselines on the CIFAR10 and CIFAR100 benchmarks, and out-of-domain data from their corrupted versions.

Related papers

Rethinking Early Stopping: Refine, Then Calibrate [49.966899634962374]
We show that calibration error and refinement error are not minimized simultaneously during training. We introduce a new metric for early stopping and hyper parameter tuning that makes it possible to minimize refinement error during training. Our method integrates seamlessly with any architecture and consistently improves performance across diverse classification tasks.
arXiv Detail & Related papers (2025-01-31T15:03:54Z)
Feature Clipping for Uncertainty Calibration [24.465567005078135]
Modern deep neural networks (DNNs) often suffer from overconfidence, leading to miscalibration. We propose a novel post-hoc calibration method called feature clipping (FC) to address this issue. FC involves clipping feature values to a specified threshold, effectively increasing entropy in high calibration error samples.
arXiv Detail & Related papers (2024-10-16T06:44:35Z)
Calibration by Distribution Matching: Trainable Kernel Calibration Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression. These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization. We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z)
Calibration of Neural Networks [77.34726150561087]
This paper presents a survey of confidence calibration problems in the context of neural networks. We analyze problem statement, calibration definitions, and different approaches to evaluation. Empirical experiments cover various datasets and models, comparing calibration methods according to different criteria.
arXiv Detail & Related papers (2023-03-19T20:27:51Z)
Can Calibration Improve Sample Prioritization? [15.17599622490369]
We study the effect of popular calibration techniques in selecting better subsets of samples during training. We observe that calibration can improve the quality of subsets, reduce the number of examples per epoch, and can thereby speed up the overall training process.
arXiv Detail & Related papers (2022-10-12T21:11:08Z)
Sample-dependent Adaptive Temperature Scaling for Improved Calibration [95.7477042886242]
Post-hoc approach to compensate for neural networks being wrong is to perform temperature scaling. We propose to predict a different temperature value for each input, allowing us to adjust the mismatch between confidence and accuracy. We test our method on the ResNet50 and WideResNet28-10 architectures using the CIFAR10/100 and Tiny-ImageNet datasets.
arXiv Detail & Related papers (2022-07-13T14:13:49Z)
On the Calibration of Pre-trained Language Models using Mixup Guided by Area Under the Margin and Saliency [47.90235939359225]
We propose a novel mixup strategy for pre-trained language models that improves model calibration further. Our method achieves the lowest expected calibration error compared to strong baselines on both in-domain and out-of-domain test samples.
arXiv Detail & Related papers (2022-03-14T23:45:08Z)
Localized Calibration: Metrics and Recalibration [133.07044916594361]
We propose a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration. We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods.
arXiv Detail & Related papers (2021-02-22T07:22:12Z)
When and How Mixup Improves Calibration [19.11486078732542]
In many machine learning applications, it is important for the model to provide confidence scores that accurately captures its prediction uncertainty. In this paper, we theoretically prove that Mixup improves calibration in textithigh-dimensional settings by investigating two natural data models. While incorporating unlabeled data can sometimes make the model less calibrated, adding Mixup trainings this issue and provably improves calibration.
arXiv Detail & Related papers (2021-02-11T22:24:54Z)
Combining Ensembles and Data Augmentation can Harm your Calibration [33.94335246681807]
We show a surprising pathology: combining ensembles and data augmentation can harm model calibration. We propose a simple correction, achieving the best of both worlds with significant accuracy and calibration gains over using only ensembles or data augmentation individually.
arXiv Detail & Related papers (2020-10-19T21:25:22Z)
Uncertainty Quantification and Deep Ensembles [79.4957965474334]
We show that deep-ensembles do not necessarily lead to improved calibration properties. We show that standard ensembling methods, when used in conjunction with modern techniques such as mixup regularization, can lead to less calibrated models. This text examines the interplay between three of the most simple and commonly used approaches to leverage deep learning when data is scarce.
arXiv Detail & Related papers (2020-07-17T07:32:24Z)
Mix-n-Match: Ensemble and Compositional Methods for Uncertainty Calibration in Deep Learning [21.08664370117846]
We show how Mix-n-Match calibration strategies can help achieve remarkably better data-efficiency and expressive power. We also reveal potential issues in standard evaluation practices. Our approaches outperform state-of-the-art solutions on both the calibration as well as the evaluation tasks.
arXiv Detail & Related papers (2020-03-16T17:00:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.