Related papers: On the Calibration of Pre-trained Language Models using Mixup Guided by Area Under the Margin and Saliency

On the Calibration of Pre-trained Language Models using Mixup Guided by Area Under the Margin and Saliency

URL: http://arxiv.org/abs/2203.07559v1
Date: Mon, 14 Mar 2022 23:45:08 GMT
Title: On the Calibration of Pre-trained Language Models using Mixup Guided by Area Under the Margin and Saliency
Authors: Seo Yeon Park and Cornelia Caragea
Abstract summary: We propose a novel mixup strategy for pre-trained language models that improves model calibration further. Our method achieves the lowest expected calibration error compared to strong baselines on both in-domain and out-of-domain test samples.
Score: 47.90235939359225
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A well-calibrated neural model produces confidence (probability outputs) closely approximated by the expected accuracy. While prior studies have shown that mixup training as a data augmentation technique can improve model calibration on image classification tasks, little is known about using mixup for model calibration on natural language understanding (NLU) tasks. In this paper, we explore mixup for model calibration on several NLU tasks and propose a novel mixup strategy for pre-trained language models that improves model calibration further. Our proposed mixup is guided by both the Area Under the Margin (AUM) statistic (Pleiss et al., 2020) and the saliency map of each sample (Simonyan et al.,2013). Moreover, we combine our mixup strategy with model miscalibration correction techniques (i.e., label smoothing and temperature scaling) and provide detailed analyses of their impact on our proposed mixup. We focus on systematically designing experiments on three NLU tasks: natural language inference, paraphrase detection, and commonsense reasoning. Our method achieves the lowest expected calibration error compared to strong baselines on both in-domain and out-of-domain test samples while maintaining competitive accuracy.

Related papers

Beyond One-Hot Labels: Semantic Mixing for Model Calibration [22.39558434131574]
We introduce calibration-aware data augmentation to create synthetic datasets of diverse samples and their ground-truth uncertainty. We propose calibrated reannotation to tackle the misalignment between the annotated confidence score and the mixing ratio. Experimental results demonstrate that CSM achieves superior calibration compared to the state-of-the-art calibration approaches.
arXiv Detail & Related papers (2025-04-18T08:26:18Z)
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance [55.872926690722714]
We study the predictability of model performance regarding the mixture proportions in function forms. We propose nested use of the scaling laws of training steps, model sizes, and our data mixing law. Our method effectively optimize the training mixture of a 1B model trained for 100B tokens in RedPajama.
arXiv Detail & Related papers (2024-03-25T17:14:00Z)
Probabilistic Calibration by Design for Neural Network Regression [2.3020018305241337]
We introduce a novel end-to-end model training procedure called Quantile Recalibration Training. We demonstrate the performance of our method in a large-scale experiment involving 57 regression datasets.
arXiv Detail & Related papers (2024-03-18T17:04:33Z)
Calibrating Large Language Models with Sample Consistency [76.23956851098598]
We explore the potential of deriving confidence from the distribution of multiple randomly sampled model generations, via three measures of consistency. Results show that consistency-based calibration methods outperform existing post-hoc approaches. We offer practical guidance on choosing suitable consistency metrics for calibration, tailored to the characteristics of various LMs.
arXiv Detail & Related papers (2024-02-21T16:15:20Z)
Calibration of Neural Networks [77.34726150561087]
This paper presents a survey of confidence calibration problems in the context of neural networks. We analyze problem statement, calibration definitions, and different approaches to evaluation. Empirical experiments cover various datasets and models, comparing calibration methods according to different criteria.
arXiv Detail & Related papers (2023-03-19T20:27:51Z)
On Calibrating Semantic Segmentation Models: Analyses and An Algorithm [51.85289816613351]
We study the problem of semantic segmentation calibration. Model capacity, crop size, multi-scale testing, and prediction correctness have impact on calibration. We propose a simple, unifying, and effective approach, namely selective scaling.
arXiv Detail & Related papers (2022-12-22T22:05:16Z)
On the Importance of Calibration in Semi-supervised Learning [13.859032326378188]
State-of-the-art (SOTA) semi-supervised learning (SSL) methods have been highly successful in leveraging a mix of labeled and unlabeled data. We introduce a family of new SSL models that optimize for calibration and demonstrate their effectiveness across standard vision benchmarks.
arXiv Detail & Related papers (2022-10-10T15:41:44Z)
Prototypical Calibration for Few-shot Learning of Language Models [84.5759596754605]
GPT-like models have been recognized as fragile across different hand-crafted templates, and demonstration permutations. We propose prototypical calibration to adaptively learn a more robust decision boundary for zero- and few-shot classification. Our method calibrates the decision boundary as expected, greatly improving the robustness of GPT to templates, permutations, and class imbalance.
arXiv Detail & Related papers (2022-05-20T13:50:07Z)
When and How Mixup Improves Calibration [19.11486078732542]
In many machine learning applications, it is important for the model to provide confidence scores that accurately captures its prediction uncertainty. In this paper, we theoretically prove that Mixup improves calibration in textithigh-dimensional settings by investigating two natural data models. While incorporating unlabeled data can sometimes make the model less calibrated, adding Mixup trainings this issue and provably improves calibration.
arXiv Detail & Related papers (2021-02-11T22:24:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.