Making Pre-trained Language Models both Task-solvers and
Self-calibrators
- URL: http://arxiv.org/abs/2307.11316v1
- Date: Fri, 21 Jul 2023 02:51:41 GMT
- Title: Making Pre-trained Language Models both Task-solvers and
Self-calibrators
- Authors: Yangyi Chen, Xingyao Wang, Heng Ji
- Abstract summary: Pre-trained language models (PLMs) serve as backbones for various real-world systems.
Previous work shows that introducing an extra calibration task can mitigate this issue.
We propose a training algorithm LM-TOAST to tackle the challenges.
- Score: 52.98858650625623
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Pre-trained language models (PLMs) serve as backbones for various real-world
systems. For high-stake applications, it's equally essential to have reasonable
confidence estimations in predictions. While the vanilla confidence scores of
PLMs can already be effectively utilized, PLMs consistently become
overconfident in their wrong predictions, which is not desirable in practice.
Previous work shows that introducing an extra calibration task can mitigate
this issue. The basic idea involves acquiring additional data to train models
in predicting the confidence of their initial predictions. However, it only
demonstrates the feasibility of this kind of method, assuming that there are
abundant extra available samples for the introduced calibration task. In this
work, we consider the practical scenario that we need to effectively utilize
training samples to make PLMs both task-solvers and self-calibrators. Three
challenges are presented, including limited training samples, data imbalance,
and distribution shifts. We first conduct pilot experiments to quantify various
decisive factors in the calibration task. Based on the empirical analysis
results, we propose a training algorithm LM-TOAST to tackle the challenges.
Experimental results show that LM-TOAST can effectively utilize the training
data to make PLMs have reasonable confidence estimations while maintaining the
original task performance. Further, we consider three downstream applications,
namely selective classification, adversarial defense, and model cascading, to
show the practical usefulness of LM-TOAST. The code will be made public at
\url{https://github.com/Yangyi-Chen/LM-TOAST}.
Related papers
- What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy.
By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z) - Adaptive Pre-training Data Detection for Large Language Models via Surprising Tokens [1.2549198550400134]
Large language models (LLMs) are extensively used, but there are concerns regarding privacy, security, and copyright due to their opaque training data.
Current solutions to this problem leverage techniques explored in machine learning privacy such as Membership Inference Attacks (MIAs)
We propose an adaptive pre-training data detection method which alleviates this reliance and effectively amplify the identification.
arXiv Detail & Related papers (2024-07-30T23:43:59Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - TaskMet: Task-Driven Metric Learning for Model Learning [29.0053868393653]
Deep learning models are often deployed in downstream tasks that the training procedure may not be aware of.
We propose take the task loss signal one level deeper than the parameters of the model and use it to learn the parameters of the loss function the model is trained on.
This approach does not alter the optimal prediction model itself, but rather changes the model learning to emphasize the information important for the downstream task.
arXiv Detail & Related papers (2023-12-08T18:59:03Z) - How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression? [92.90857135952231]
Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL) capabilities.
We study ICL in one of its simplest setups: pretraining a linearly parameterized single-layer linear attention model for linear regression.
arXiv Detail & Related papers (2023-10-12T15:01:43Z) - Test-Time Adaptation with Perturbation Consistency Learning [32.58879780726279]
We propose a simple test-time adaptation method to promote the model to make stable predictions for samples with distribution shifts.
Our method can achieve higher or comparable performance with less inference time over strong PLM backbones.
arXiv Detail & Related papers (2023-04-25T12:29:22Z) - Language Model Pre-training on True Negatives [109.73819321246062]
Discriminative pre-trained language models (PLMs) learn to predict original texts from intentionally corrupted ones.
Existing PLMs simply treat all corrupted texts as equal negative without any examination.
We design enhanced pre-training methods to counteract false negative predictions and encourage pre-training language models on true negatives.
arXiv Detail & Related papers (2022-12-01T12:24:19Z) - A Close Look into the Calibration of Pre-trained Language Models [56.998539510508515]
Pre-trained language models (PLMs) may fail in giving reliable estimates of their predictive uncertainty.
We study the dynamic change in PLMs' calibration performance in training.
We extend two recently proposed learnable methods that directly collect data to train models to have reasonable confidence estimations.
arXiv Detail & Related papers (2022-10-31T21:31:07Z) - RuleBert: Teaching Soft Rules to Pre-trained Language Models [21.69870624809201]
We introduce a classification task where, given facts and soft rules, the PLM should return a prediction with a probability for a given hypothesis.
We propose a revised loss function that enables the PLM to learn how to predict precise probabilities for the task.
Our evaluation results show that the resulting fine-tuned models achieve very high performance, even on logical rules that were unseen at training.
arXiv Detail & Related papers (2021-09-24T16:19:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.