Related papers: Bag of Tricks for In-Distribution Calibration of Pretrained Transformers

Bag of Tricks for In-Distribution Calibration of Pretrained Transformers

URL: http://arxiv.org/abs/2302.06690v1
Date: Mon, 13 Feb 2023 21:11:52 GMT
Title: Bag of Tricks for In-Distribution Calibration of Pretrained Transformers
Authors: Jaeyoung Kim, Dongbin Na, Sungchul Choi, Sungbin Lim
Abstract summary: We present an empirical study on confidence calibration for pre-trained language models (PLMs) We find that the ensemble model overfitted to the training set shows sub-par calibration performance. We propose the Calibrated PLM (CALL), a combination of calibration techniques.
Score: 8.876196316390493
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While pre-trained language models (PLMs) have become a de-facto standard promoting the accuracy of text classification tasks, recent studies find that PLMs often predict over-confidently. Although various calibration methods have been proposed, such as ensemble learning and data augmentation, most of the methods have been verified in computer vision benchmarks rather than in PLM-based text classification tasks. In this paper, we present an empirical study on confidence calibration for PLMs, addressing three categories, including confidence penalty losses, data augmentations, and ensemble methods. We find that the ensemble model overfitted to the training set shows sub-par calibration performance and also observe that PLMs trained with confidence penalty loss have a trade-off between calibration and accuracy. Building on these observations, we propose the Calibrated PLM (CALL), a combination of calibration techniques. The CALL complements the drawbacks that may occur when utilizing a calibration method individually and boosts both classification and calibration accuracy. Design choices in CALL's training procedures are extensively studied, and we provide a detailed analysis of how calibration techniques affect the calibration performance of PLMs.

Related papers

Contrast-Aware Calibration for Fine-Tuned CLIP: Leveraging Image-Text Alignment [46.87809309786518]
Vision-language models (VLMs) have demonstrated exceptional capabilities and can quickly adapt to downstream tasks through prompt fine-tuning. In classification tasks involving non-training classes, such as open-vocabulary setting, fine-tuned VLMs often overfit to train classes, resulting in a misalignment between confidence scores and actual accuracy on unseen classes. Existing confidence calibration methods typically require training parameters or analyzing features from the training dataset, restricting their ability to generalize unseen classes without corresponding train data. In experiments involving 11 datasets with 5 fine-tuning methods, CAC consistently achieved the best calibration effect on both train and unseen classes without
arXiv Detail & Related papers (2025-01-31T11:47:15Z)
Optimizing Estimators of Squared Calibration Errors in Classification [2.3020018305241337]
We propose a mean-squared error-based risk that enables the comparison and optimization of estimators of squared calibration errors. Our approach advocates for a training-validation-testing pipeline when estimating a calibration error.
arXiv Detail & Related papers (2024-10-09T15:58:06Z)
C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion [54.81141583427542]
In deep learning, test-time adaptation has gained attention as a method for model fine-tuning without the need for labeled data. This paper explores calibration during test-time prompt tuning by leveraging the inherent properties of CLIP. We present a novel method, Calibrated Test-time Prompt Tuning (C-TPT), for optimizing prompts during test-time with enhanced calibration.
arXiv Detail & Related papers (2024-03-21T04:08:29Z)
On the Calibration of Large Language Models and Alignment [63.605099174744865]
Confidence calibration serves as a crucial tool for gauging the reliability of deep models. We conduct a systematic examination of the calibration of aligned language models throughout the entire construction process. Our work sheds light on whether popular LLMs are well-calibrated and how the training process influences model calibration.
arXiv Detail & Related papers (2023-11-22T08:57:55Z)
Calibration by Distribution Matching: Trainable Kernel Calibration Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression. These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization. We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z)
Scaling of Class-wise Training Losses for Post-hoc Calibration [6.0632746602205865]
We propose a new calibration method to synchronize the class-wise training losses. We design a new training loss to alleviate the variance of class-wise training losses by using multiple class-wise scaling factors. We validate the proposed framework by employing it in the various post-hoc calibration methods.
arXiv Detail & Related papers (2023-06-19T14:59:37Z)
A Close Look into the Calibration of Pre-trained Language Models [56.998539510508515]
Pre-trained language models (PLMs) may fail in giving reliable estimates of their predictive uncertainty. We study the dynamic change in PLMs' calibration performance in training. We extend two recently proposed learnable methods that directly collect data to train models to have reasonable confidence estimations.
arXiv Detail & Related papers (2022-10-31T21:31:07Z)
Localized Calibration: Metrics and Recalibration [133.07044916594361]
We propose a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration. We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods.
arXiv Detail & Related papers (2021-02-22T07:22:12Z)
On Calibration of Scene-Text Recognition Models [16.181357648680365]
We analyze several recent STR methods and show that they are consistently overconfident. We demonstrate that for attention based decoders, calibration of individual character predictions increases word-level calibration error.
arXiv Detail & Related papers (2020-12-23T13:25:25Z)
Unsupervised Calibration under Covariate Shift [92.02278658443166]
We introduce the problem of calibration under domain shift and propose an importance sampling based approach to address it. We evaluate and discuss the efficacy of our method on both real-world datasets and synthetic datasets.
arXiv Detail & Related papers (2020-06-29T21:50:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.