On the Limitations of Temperature Scaling for Distributions with
Overlaps
- URL: http://arxiv.org/abs/2306.00740v3
- Date: Tue, 13 Feb 2024 22:59:13 GMT
- Title: On the Limitations of Temperature Scaling for Distributions with
Overlaps
- Authors: Muthu Chidambaram and Rong Ge
- Abstract summary: We show that for empirical risk minimizers for a general set of distributions, the performance of temperature scaling degrades with the amount of overlap between classes.
We prove that optimizing a modified form of the empirical risk induced by the Mixup data augmentation technique can in fact lead to reasonably good calibration performance.
- Score: 8.486166869140929
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the impressive generalization capabilities of deep neural networks,
they have been repeatedly shown to be overconfident when they are wrong. Fixing
this issue is known as model calibration, and has consequently received much
attention in the form of modified training schemes and post-training
calibration procedures such as temperature scaling. While temperature scaling
is frequently used because of its simplicity, it is often outperformed by
modified training schemes. In this work, we identify a specific bottleneck for
the performance of temperature scaling. We show that for empirical risk
minimizers for a general set of distributions in which the supports of classes
have overlaps, the performance of temperature scaling degrades with the amount
of overlap between classes, and asymptotically becomes no better than random
when there are a large number of classes. On the other hand, we prove that
optimizing a modified form of the empirical risk induced by the Mixup data
augmentation technique can in fact lead to reasonably good calibration
performance, showing that training-time calibration may be necessary in some
situations. We also verify that our theoretical results reflect practice by
showing that Mixup significantly outperforms empirical risk minimization (with
respect to multiple calibration metrics) on image classification benchmarks
with class overlaps introduced in the form of label noise.
Related papers
- Calibration by Distribution Matching: Trainable Kernel Calibration
Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression.
These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization.
We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z) - Dual-Branch Temperature Scaling Calibration for Long-Tailed Recognition [19.12557383199547]
This paper proposes a dual-branch temperature scaling calibration model (Dual-TS)
It considers the diversities in temperature parameters of different categories and the non-generalizability of temperature parameters for rare samples in minority classes simultaneously.
Our model yields state-of-the-art in both traditional ECE and Esbin-ECE metrics.
arXiv Detail & Related papers (2023-08-16T13:40:58Z) - Set Learning for Accurate and Calibrated Models [17.187117466317265]
Odd-$k$-out learning minimizes the cross-entropy error for sets rather than for single examples.
OKO often yields better calibration even when training with hard labels and dropping any additional calibration parameter tuning.
arXiv Detail & Related papers (2023-07-05T12:39:58Z) - Multi-Head Multi-Loss Model Calibration [13.841172927454204]
We introduce a form of simplified ensembling that bypasses the costly training and inference of deep ensembles.
Specifically, each head is trained to minimize a weighted Cross-Entropy loss, but the weights are different among the different branches.
We show that the resulting averaged predictions can achieve excellent calibration without sacrificing accuracy in two challenging datasets.
arXiv Detail & Related papers (2023-03-02T09:32:32Z) - Sharp Calibrated Gaussian Processes [58.94710279601622]
State-of-the-art approaches for designing calibrated models rely on inflating the Gaussian process posterior variance.
We present a calibration approach that generates predictive quantiles using a computation inspired by the vanilla Gaussian process posterior variance.
Our approach is shown to yield a calibrated model under reasonable assumptions.
arXiv Detail & Related papers (2023-02-23T12:17:36Z) - Adaptive Temperature Scaling for Robust Calibration of Deep Neural
Networks [0.7219077740523682]
We focus on the task of confidence scaling, specifically on post-hoc methods that generalize Temperature Scaling.
We show that when there is plenty of data complex models like neural networks yield better performance, but are prone to fail when the amount of data is limited.
We propose Entropy-based Temperature Scaling, a simple method that scales the confidence of a prediction according to its entropy.
arXiv Detail & Related papers (2022-07-31T16:20:06Z) - Sample-dependent Adaptive Temperature Scaling for Improved Calibration [95.7477042886242]
Post-hoc approach to compensate for neural networks being wrong is to perform temperature scaling.
We propose to predict a different temperature value for each input, allowing us to adjust the mismatch between confidence and accuracy.
We test our method on the ResNet50 and WideResNet28-10 architectures using the CIFAR10/100 and Tiny-ImageNet datasets.
arXiv Detail & Related papers (2022-07-13T14:13:49Z) - Parameterized Temperature Scaling for Boosting the Expressive Power in
Post-Hoc Uncertainty Calibration [57.568461777747515]
We introduce a novel calibration method, Parametrized Temperature Scaling (PTS)
We demonstrate that the performance of accuracy-preserving state-of-the-art post-hoc calibrators is limited by their intrinsic expressive power.
We show with extensive experiments that our novel accuracy-preserving approach consistently outperforms existing algorithms across a large number of model architectures, datasets and metrics.
arXiv Detail & Related papers (2021-02-24T10:18:30Z) - Uncertainty Quantification and Deep Ensembles [79.4957965474334]
We show that deep-ensembles do not necessarily lead to improved calibration properties.
We show that standard ensembling methods, when used in conjunction with modern techniques such as mixup regularization, can lead to less calibrated models.
This text examines the interplay between three of the most simple and commonly used approaches to leverage deep learning when data is scarce.
arXiv Detail & Related papers (2020-07-17T07:32:24Z) - Unsupervised Calibration under Covariate Shift [92.02278658443166]
We introduce the problem of calibration under domain shift and propose an importance sampling based approach to address it.
We evaluate and discuss the efficacy of our method on both real-world datasets and synthetic datasets.
arXiv Detail & Related papers (2020-06-29T21:50:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.