Uncertainty Quantification with Pre-trained Language Models: A
Large-Scale Empirical Analysis
- URL: http://arxiv.org/abs/2210.04714v1
- Date: Mon, 10 Oct 2022 14:16:01 GMT
- Title: Uncertainty Quantification with Pre-trained Language Models: A
Large-Scale Empirical Analysis
- Authors: Yuxin Xiao, Paul Pu Liang, Umang Bhatt, Willie Neiswanger, Ruslan
Salakhutdinov, Louis-Philippe Morency
- Abstract summary: It is crucial for the pipeline to minimize the calibration error, especially in safety-critical applications.
There are various considerations behind the pipeline: (1) the choice and (2) the size of PLM, (3) the choice of uncertainty quantifier, (4) the choice of fine-tuning loss, and many more.
In response, we recommend the following: (1) use ELECTRA for PLM encoding, (2) use larger PLMs if possible, (3) use Temp Scaling as the uncertainty quantifier, and (4) use Focal Loss for fine-tuning.
- Score: 120.9545643534454
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained language models (PLMs) have gained increasing popularity due to
their compelling prediction performance in diverse natural language processing
(NLP) tasks. When formulating a PLM-based prediction pipeline for NLP tasks, it
is also crucial for the pipeline to minimize the calibration error, especially
in safety-critical applications. That is, the pipeline should reliably indicate
when we can trust its predictions. In particular, there are various
considerations behind the pipeline: (1) the choice and (2) the size of PLM, (3)
the choice of uncertainty quantifier, (4) the choice of fine-tuning loss, and
many more. Although prior work has looked into some of these considerations,
they usually draw conclusions based on a limited scope of empirical studies.
There still lacks a holistic analysis on how to compose a well-calibrated
PLM-based prediction pipeline. To fill this void, we compare a wide range of
popular options for each consideration based on three prevalent NLP
classification tasks and the setting of domain shift. In response, we recommend
the following: (1) use ELECTRA for PLM encoding, (2) use larger PLMs if
possible, (3) use Temp Scaling as the uncertainty quantifier, and (4) use Focal
Loss for fine-tuning.
Related papers
- Prediction-Powered Adaptive Shrinkage Estimation [0.9208007322096532]
Prediction-Powered Adaptive Shrinkage (PAS) is a method that bridges PPI with empirical Bayes shrinkage to improve the estimation of multiple means.
PAS adapts to the reliability of the ML predictions and outperforms traditional and modern baselines in large-scale applications.
arXiv Detail & Related papers (2025-02-20T00:24:05Z) - Predicting Emergent Capabilities by Finetuning [98.9684114851891]
We find that finetuning language models can shift the point in scaling at which emergence occurs towards less capable models.
We validate this approach using four standard NLP benchmarks.
We find that, in some cases, we can accurately predict whether models trained with up to 4x more compute have emerged.
arXiv Detail & Related papers (2024-11-25T01:48:09Z) - Scaling Laws for Predicting Downstream Performance in LLMs [75.28559015477137]
This work focuses on the pre-training loss as a more-efficient metric for performance estimation.
We extend the power law analytical function to predict domain-specific pre-training loss based on FLOPs across data sources.
We employ a two-layer neural network to model the non-linear relationship between multiple domain-specific loss and downstream performance.
arXiv Detail & Related papers (2024-10-11T04:57:48Z) - Query Performance Prediction using Relevance Judgments Generated by Large Language Models [53.97064615557883]
We propose a QPP framework using automatically generated relevance judgments (QPP-GenRE)
QPP-GenRE decomposes QPP into independent subtasks of predicting relevance of each item in a ranked list to a given query.
This allows us to predict any IR evaluation measure using the generated relevance judgments as pseudo-labels.
arXiv Detail & Related papers (2024-04-01T09:33:05Z) - Making Pre-trained Language Models both Task-solvers and
Self-calibrators [52.98858650625623]
Pre-trained language models (PLMs) serve as backbones for various real-world systems.
Previous work shows that introducing an extra calibration task can mitigate this issue.
We propose a training algorithm LM-TOAST to tackle the challenges.
arXiv Detail & Related papers (2023-07-21T02:51:41Z) - Selection by Prediction with Conformal p-values [7.917044695538599]
We study screening procedures that aim to select candidates whose unobserved outcomes exceed user-specified values.
We develop a method that wraps around any prediction model to produce a subset of candidates while controlling the proportion of falsely selected units.
arXiv Detail & Related papers (2022-10-04T06:34:49Z) - Solving Multistage Stochastic Linear Programming via Regularized Linear
Decision Rules: An Application to Hydrothermal Dispatch Planning [77.34726150561087]
We propose a novel regularization scheme for linear decision rules (LDR) based on the AdaSO (adaptive least absolute shrinkage and selection operator)
Experiments show that the overfit threat is non-negligible when using the classical non-regularized LDR to solve MSLP.
For the LHDP problem, our analysis highlights the following benefits of the proposed framework in comparison to the non-regularized benchmark.
arXiv Detail & Related papers (2021-10-07T02:36:14Z) - Towards Improving Selective Prediction Ability of NLP Systems [24.774450633678125]
We propose a method that improves probability estimates of models by calibrating them using prediction confidence and difficulty score of instances.
We instantiate our method with Natural Language Inference (NLI) and Duplicate Detection (DD) tasks and evaluate it in both In-Domain (IID) and Out-of-Domain (OOD) settings.
arXiv Detail & Related papers (2020-08-21T08:46:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.