Bag of Tricks for In-Distribution Calibration of Pretrained Transformers
- URL: http://arxiv.org/abs/2302.06690v1
- Date: Mon, 13 Feb 2023 21:11:52 GMT
- Title: Bag of Tricks for In-Distribution Calibration of Pretrained Transformers
- Authors: Jaeyoung Kim, Dongbin Na, Sungchul Choi, Sungbin Lim
- Abstract summary: We present an empirical study on confidence calibration for pre-trained language models (PLMs)
We find that the ensemble model overfitted to the training set shows sub-par calibration performance.
We propose the Calibrated PLM (CALL), a combination of calibration techniques.
- Score: 8.876196316390493
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While pre-trained language models (PLMs) have become a de-facto standard
promoting the accuracy of text classification tasks, recent studies find that
PLMs often predict over-confidently. Although various calibration methods have
been proposed, such as ensemble learning and data augmentation, most of the
methods have been verified in computer vision benchmarks rather than in
PLM-based text classification tasks. In this paper, we present an empirical
study on confidence calibration for PLMs, addressing three categories,
including confidence penalty losses, data augmentations, and ensemble methods.
We find that the ensemble model overfitted to the training set shows sub-par
calibration performance and also observe that PLMs trained with confidence
penalty loss have a trade-off between calibration and accuracy. Building on
these observations, we propose the Calibrated PLM (CALL), a combination of
calibration techniques. The CALL complements the drawbacks that may occur when
utilizing a calibration method individually and boosts both classification and
calibration accuracy. Design choices in CALL's training procedures are
extensively studied, and we provide a detailed analysis of how calibration
techniques affect the calibration performance of PLMs.
Related papers
- Optimizing Estimators of Squared Calibration Errors in Classification [2.3020018305241337]
We propose a mean-squared error-based risk that enables the comparison and optimization of estimators of squared calibration errors.
Our approach advocates for a training-validation-testing pipeline when estimating a calibration error.
arXiv Detail & Related papers (2024-10-09T15:58:06Z) - C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion [54.81141583427542]
In deep learning, test-time adaptation has gained attention as a method for model fine-tuning without the need for labeled data.
This paper explores calibration during test-time prompt tuning by leveraging the inherent properties of CLIP.
We present a novel method, Calibrated Test-time Prompt Tuning (C-TPT), for optimizing prompts during test-time with enhanced calibration.
arXiv Detail & Related papers (2024-03-21T04:08:29Z) - On the Calibration of Large Language Models and Alignment [63.605099174744865]
Confidence calibration serves as a crucial tool for gauging the reliability of deep models.
We conduct a systematic examination of the calibration of aligned language models throughout the entire construction process.
Our work sheds light on whether popular LLMs are well-calibrated and how the training process influences model calibration.
arXiv Detail & Related papers (2023-11-22T08:57:55Z) - Calibration by Distribution Matching: Trainable Kernel Calibration
Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression.
These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization.
We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z) - Scaling of Class-wise Training Losses for Post-hoc Calibration [6.0632746602205865]
We propose a new calibration method to synchronize the class-wise training losses.
We design a new training loss to alleviate the variance of class-wise training losses by using multiple class-wise scaling factors.
We validate the proposed framework by employing it in the various post-hoc calibration methods.
arXiv Detail & Related papers (2023-06-19T14:59:37Z) - A Close Look into the Calibration of Pre-trained Language Models [56.998539510508515]
Pre-trained language models (PLMs) may fail in giving reliable estimates of their predictive uncertainty.
We study the dynamic change in PLMs' calibration performance in training.
We extend two recently proposed learnable methods that directly collect data to train models to have reasonable confidence estimations.
arXiv Detail & Related papers (2022-10-31T21:31:07Z) - Localized Calibration: Metrics and Recalibration [133.07044916594361]
We propose a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration.
We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods.
arXiv Detail & Related papers (2021-02-22T07:22:12Z) - On Calibration of Scene-Text Recognition Models [16.181357648680365]
We analyze several recent STR methods and show that they are consistently overconfident.
We demonstrate that for attention based decoders, calibration of individual character predictions increases word-level calibration error.
arXiv Detail & Related papers (2020-12-23T13:25:25Z) - Unsupervised Calibration under Covariate Shift [92.02278658443166]
We introduce the problem of calibration under domain shift and propose an importance sampling based approach to address it.
We evaluate and discuss the efficacy of our method on both real-world datasets and synthetic datasets.
arXiv Detail & Related papers (2020-06-29T21:50:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.