Parameterized Temperature Scaling for Boosting the Expressive Power in
Post-Hoc Uncertainty Calibration
- URL: http://arxiv.org/abs/2102.12182v1
- Date: Wed, 24 Feb 2021 10:18:30 GMT
- Title: Parameterized Temperature Scaling for Boosting the Expressive Power in
Post-Hoc Uncertainty Calibration
- Authors: Christian Tomani, Daniel Cremers, Florian Buettner
- Abstract summary: We introduce a novel calibration method, Parametrized Temperature Scaling (PTS)
We demonstrate that the performance of accuracy-preserving state-of-the-art post-hoc calibrators is limited by their intrinsic expressive power.
We show with extensive experiments that our novel accuracy-preserving approach consistently outperforms existing algorithms across a large number of model architectures, datasets and metrics.
- Score: 57.568461777747515
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the problem of uncertainty calibration and introduce a novel
calibration method, Parametrized Temperature Scaling (PTS). Standard deep
neural networks typically yield uncalibrated predictions, which can be
transformed into calibrated confidence scores using post-hoc calibration
methods. In this contribution, we demonstrate that the performance of
accuracy-preserving state-of-the-art post-hoc calibrators is limited by their
intrinsic expressive power. We generalize temperature scaling by computing
prediction-specific temperatures, parameterized by a neural network. We show
with extensive experiments that our novel accuracy-preserving approach
consistently outperforms existing algorithms across a large number of model
architectures, datasets and metrics.
Related papers
- Feature Clipping for Uncertainty Calibration [24.465567005078135]
Modern deep neural networks (DNNs) often suffer from overconfidence, leading to miscalibration.
We propose a novel post-hoc calibration method called feature clipping (FC) to address this issue.
FC involves clipping feature values to a specified threshold, effectively increasing entropy in high calibration error samples.
arXiv Detail & Related papers (2024-10-16T06:44:35Z) - Calibrating Language Models with Adaptive Temperature Scaling [58.056023173579625]
We introduce Adaptive Temperature Scaling (ATS), a post-hoc calibration method that predicts a temperature scaling parameter for each token prediction.
ATS improves calibration by over 10-50% across three downstream natural language evaluation benchmarks compared to prior calibration methods.
arXiv Detail & Related papers (2024-09-29T22:54:31Z) - Calibration of Neural Networks [77.34726150561087]
This paper presents a survey of confidence calibration problems in the context of neural networks.
We analyze problem statement, calibration definitions, and different approaches to evaluation.
Empirical experiments cover various datasets and models, comparing calibration methods according to different criteria.
arXiv Detail & Related papers (2023-03-19T20:27:51Z) - Sharp Calibrated Gaussian Processes [58.94710279601622]
State-of-the-art approaches for designing calibrated models rely on inflating the Gaussian process posterior variance.
We present a calibration approach that generates predictive quantiles using a computation inspired by the vanilla Gaussian process posterior variance.
Our approach is shown to yield a calibrated model under reasonable assumptions.
arXiv Detail & Related papers (2023-02-23T12:17:36Z) - Adaptive Temperature Scaling for Robust Calibration of Deep Neural
Networks [0.7219077740523682]
We focus on the task of confidence scaling, specifically on post-hoc methods that generalize Temperature Scaling.
We show that when there is plenty of data complex models like neural networks yield better performance, but are prone to fail when the amount of data is limited.
We propose Entropy-based Temperature Scaling, a simple method that scales the confidence of a prediction according to its entropy.
arXiv Detail & Related papers (2022-07-31T16:20:06Z) - Sample-dependent Adaptive Temperature Scaling for Improved Calibration [95.7477042886242]
Post-hoc approach to compensate for neural networks being wrong is to perform temperature scaling.
We propose to predict a different temperature value for each input, allowing us to adjust the mismatch between confidence and accuracy.
We test our method on the ResNet50 and WideResNet28-10 architectures using the CIFAR10/100 and Tiny-ImageNet datasets.
arXiv Detail & Related papers (2022-07-13T14:13:49Z) - Revisiting Calibration for Question Answering [16.54743762235555]
We argue that the traditional evaluation of calibration does not reflect usefulness of the model confidence.
We propose a new calibration metric, MacroCE, that better captures whether the model assigns low confidence to wrong predictions and high confidence to correct predictions.
arXiv Detail & Related papers (2022-05-25T05:49:56Z) - Uncertainty Quantification and Deep Ensembles [79.4957965474334]
We show that deep-ensembles do not necessarily lead to improved calibration properties.
We show that standard ensembling methods, when used in conjunction with modern techniques such as mixup regularization, can lead to less calibrated models.
This text examines the interplay between three of the most simple and commonly used approaches to leverage deep learning when data is scarce.
arXiv Detail & Related papers (2020-07-17T07:32:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.