On the Calibration of Pre-trained Language Models using Mixup Guided by
Area Under the Margin and Saliency
- URL: http://arxiv.org/abs/2203.07559v1
- Date: Mon, 14 Mar 2022 23:45:08 GMT
- Title: On the Calibration of Pre-trained Language Models using Mixup Guided by
Area Under the Margin and Saliency
- Authors: Seo Yeon Park and Cornelia Caragea
- Abstract summary: We propose a novel mixup strategy for pre-trained language models that improves model calibration further.
Our method achieves the lowest expected calibration error compared to strong baselines on both in-domain and out-of-domain test samples.
- Score: 47.90235939359225
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A well-calibrated neural model produces confidence (probability outputs)
closely approximated by the expected accuracy. While prior studies have shown
that mixup training as a data augmentation technique can improve model
calibration on image classification tasks, little is known about using mixup
for model calibration on natural language understanding (NLU) tasks. In this
paper, we explore mixup for model calibration on several NLU tasks and propose
a novel mixup strategy for pre-trained language models that improves model
calibration further. Our proposed mixup is guided by both the Area Under the
Margin (AUM) statistic (Pleiss et al., 2020) and the saliency map of each
sample (Simonyan et al.,2013). Moreover, we combine our mixup strategy with
model miscalibration correction techniques (i.e., label smoothing and
temperature scaling) and provide detailed analyses of their impact on our
proposed mixup. We focus on systematically designing experiments on three NLU
tasks: natural language inference, paraphrase detection, and commonsense
reasoning. Our method achieves the lowest expected calibration error compared
to strong baselines on both in-domain and out-of-domain test samples while
maintaining competitive accuracy.
Related papers
- Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance [55.872926690722714]
We study the predictability of model performance regarding the mixture proportions in function forms.
We propose nested use of the scaling laws of training steps, model sizes, and our data mixing law.
Our method effectively optimize the training mixture of a 1B model trained for 100B tokens in RedPajama.
arXiv Detail & Related papers (2024-03-25T17:14:00Z) - Probabilistic Calibration by Design for Neural Network Regression [2.3020018305241337]
We introduce a novel end-to-end model training procedure called Quantile Recalibration Training.
We demonstrate the performance of our method in a large-scale experiment involving 57 regression datasets.
arXiv Detail & Related papers (2024-03-18T17:04:33Z) - Calibrating Large Language Models with Sample Consistency [76.23956851098598]
We explore the potential of deriving confidence from the distribution of multiple randomly sampled model generations, via three measures of consistency.
Results show that consistency-based calibration methods outperform existing post-hoc approaches.
We offer practical guidance on choosing suitable consistency metrics for calibration, tailored to the characteristics of various LMs.
arXiv Detail & Related papers (2024-02-21T16:15:20Z) - Calibration of Neural Networks [77.34726150561087]
This paper presents a survey of confidence calibration problems in the context of neural networks.
We analyze problem statement, calibration definitions, and different approaches to evaluation.
Empirical experiments cover various datasets and models, comparing calibration methods according to different criteria.
arXiv Detail & Related papers (2023-03-19T20:27:51Z) - On Calibrating Semantic Segmentation Models: Analyses and An Algorithm [51.85289816613351]
We study the problem of semantic segmentation calibration.
Model capacity, crop size, multi-scale testing, and prediction correctness have impact on calibration.
We propose a simple, unifying, and effective approach, namely selective scaling.
arXiv Detail & Related papers (2022-12-22T22:05:16Z) - On the Importance of Calibration in Semi-supervised Learning [13.859032326378188]
State-of-the-art (SOTA) semi-supervised learning (SSL) methods have been highly successful in leveraging a mix of labeled and unlabeled data.
We introduce a family of new SSL models that optimize for calibration and demonstrate their effectiveness across standard vision benchmarks.
arXiv Detail & Related papers (2022-10-10T15:41:44Z) - Prototypical Calibration for Few-shot Learning of Language Models [84.5759596754605]
GPT-like models have been recognized as fragile across different hand-crafted templates, and demonstration permutations.
We propose prototypical calibration to adaptively learn a more robust decision boundary for zero- and few-shot classification.
Our method calibrates the decision boundary as expected, greatly improving the robustness of GPT to templates, permutations, and class imbalance.
arXiv Detail & Related papers (2022-05-20T13:50:07Z) - When and How Mixup Improves Calibration [19.11486078732542]
In many machine learning applications, it is important for the model to provide confidence scores that accurately captures its prediction uncertainty.
In this paper, we theoretically prove that Mixup improves calibration in textithigh-dimensional settings by investigating two natural data models.
While incorporating unlabeled data can sometimes make the model less calibrated, adding Mixup trainings this issue and provably improves calibration.
arXiv Detail & Related papers (2021-02-11T22:24:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.