Calibration of Pre-trained Transformers
- URL: http://arxiv.org/abs/2003.07892v3
- Date: Thu, 15 Oct 2020 17:04:21 GMT
- Title: Calibration of Pre-trained Transformers
- Authors: Shrey Desai and Greg Durrett
- Abstract summary: We focus on BERT and RoBERTa in this work, and analyze their calibration across three tasks: natural language inference, paraphrase detection, and commonsense reasoning.
We show that: (1) when used out-of-the-box, pre-trained models are calibrated in-domain, and compared to baselines, their calibration error out-of-domain can be as much as 3.5x lower; (2) temperature scaling is effective at further reducing calibration error in-domain, and using label smoothing to deliberately increase empirical uncertainty helps calibrate posteriors out-of-domain.
- Score: 55.57083429195445
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained Transformers are now ubiquitous in natural language processing,
but despite their high end-task performance, little is known empirically about
whether they are calibrated. Specifically, do these models' posterior
probabilities provide an accurate empirical measure of how likely the model is
to be correct on a given example? We focus on BERT and RoBERTa in this work,
and analyze their calibration across three tasks: natural language inference,
paraphrase detection, and commonsense reasoning. For each task, we consider
in-domain as well as challenging out-of-domain settings, where models face more
examples they should be uncertain about. We show that: (1) when used
out-of-the-box, pre-trained models are calibrated in-domain, and compared to
baselines, their calibration error out-of-domain can be as much as 3.5x lower;
(2) temperature scaling is effective at further reducing calibration error
in-domain, and using label smoothing to deliberately increase empirical
uncertainty helps calibrate posteriors out-of-domain.
Related papers
- Towards Certification of Uncertainty Calibration under Adversarial Attacks [96.48317453951418]
We show that attacks can significantly harm calibration, and thus propose certified calibration as worst-case bounds on calibration under adversarial perturbations.
We propose novel calibration attacks and demonstrate how they can improve model calibration through textitadversarial calibration training
arXiv Detail & Related papers (2024-05-22T18:52:09Z) - Preserving Pre-trained Features Helps Calibrate Fine-tuned Language
Models [23.881825575095945]
Large pre-trained language models (PLMs) have demonstrated strong performance on natural language understanding (NLU) tasks through fine-tuning.
However, fine-tuned models still suffer from overconfident predictions, especially in out-of-domain settings.
We demonstrate that the PLMs are well-calibrated on the masked language modeling task with robust predictive confidence under domain shift.
We show that preserving pre-trained features can improve the calibration of fine-tuned language models.
arXiv Detail & Related papers (2023-05-30T17:35:31Z) - A Unifying Theory of Distance from Calibration [9.959025631339982]
There is no consensus on how to quantify the distance from perfect calibration.
We propose a ground-truth notion of distance from calibration, inspired by the literature on property testing.
Applying our framework, we identify three calibration measures that are consistent and can be estimated efficiently.
arXiv Detail & Related papers (2022-11-30T10:38:24Z) - A Close Look into the Calibration of Pre-trained Language Models [56.998539510508515]
Pre-trained language models (PLMs) may fail in giving reliable estimates of their predictive uncertainty.
We study the dynamic change in PLMs' calibration performance in training.
We extend two recently proposed learnable methods that directly collect data to train models to have reasonable confidence estimations.
arXiv Detail & Related papers (2022-10-31T21:31:07Z) - Post-hoc Uncertainty Calibration for Domain Drift Scenarios [46.88826364244423]
We show that existing post-hoc calibration methods yield highly over-confident predictions under domain shift.
We introduce a simple strategy where perturbations are applied to samples in the validation set before performing the post-hoc calibration step.
arXiv Detail & Related papers (2020-12-20T18:21:13Z) - Uncertainty Quantification and Deep Ensembles [79.4957965474334]
We show that deep-ensembles do not necessarily lead to improved calibration properties.
We show that standard ensembling methods, when used in conjunction with modern techniques such as mixup regularization, can lead to less calibrated models.
This text examines the interplay between three of the most simple and commonly used approaches to leverage deep learning when data is scarce.
arXiv Detail & Related papers (2020-07-17T07:32:24Z) - Unsupervised Calibration under Covariate Shift [92.02278658443166]
We introduce the problem of calibration under domain shift and propose an importance sampling based approach to address it.
We evaluate and discuss the efficacy of our method on both real-world datasets and synthetic datasets.
arXiv Detail & Related papers (2020-06-29T21:50:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.