Generative Calibration for In-context Learning
- URL: http://arxiv.org/abs/2310.10266v1
- Date: Mon, 16 Oct 2023 10:45:02 GMT
- Title: Generative Calibration for In-context Learning
- Authors: Zhongtao Jiang, Yuanzhe Zhang, Cao Liu, Jun Zhao, Kang Liu
- Abstract summary: In this paper, we identify that such a paradox is mainly due to the label shift of the in-context model to the data distribution.
With this understanding, we can calibrate the in-context predictive distribution by adjusting the label marginal.
We call our approach as generative calibration. We conduct exhaustive experiments with 12 text classification tasks and 12 LLMs scaling from 774M to 33B.
- Score: 20.207930451266822
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As one of the most exciting features of large language models (LLMs),
in-context learning is a mixed blessing. While it allows users to
fast-prototype a task solver with only a few training examples, the performance
is generally sensitive to various configurations of the prompt such as the
choice or order of the training examples. In this paper, we for the first time
theoretically and empirically identify that such a paradox is mainly due to the
label shift of the in-context model to the data distribution, in which LLMs
shift the label marginal $p(y)$ while having a good label conditional $p(x|y)$.
With this understanding, we can simply calibrate the in-context predictive
distribution by adjusting the label marginal, which is estimated via
Monte-Carlo sampling over the in-context model, i.e., generation of LLMs. We
call our approach as generative calibration. We conduct exhaustive experiments
with 12 text classification tasks and 12 LLMs scaling from 774M to 33B,
generally find that the proposed method greatly and consistently outperforms
the ICL as well as state-of-the-art calibration methods, by up to 27% absolute
in macro-F1. Meanwhile, the proposed method is also stable under different
prompt configurations.
Related papers
- Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - Large Language Model-guided Document Selection [23.673690115025913]
Large Language Model (LLM) pre-training exhausts an ever growing compute budget.
Recent research has demonstrated that careful document selection enables comparable model quality with only a fraction of the FLOPs.
We explore a promising direction for scalable general-domain document selection.
arXiv Detail & Related papers (2024-06-07T04:52:46Z) - In-Context Example Ordering Guided by Label Distributions [34.30216341226014]
We formulate in-context example ordering as an optimization problem.
Inspired by the idea of learning from label proportions, we propose two principles for in-context example ordering guided by model's probability predictions.
We demonstrate our approach outperforms the baselines by improving the classification accuracy, reducing model miscalibration, and also by selecting better in-context examples.
arXiv Detail & Related papers (2024-02-18T04:08:10Z) - How to Prune Your Language Model: Recovering Accuracy on the "Sparsity
May Cry'' Benchmark [60.72725673114168]
We revisit the question of accurate BERT-pruning during fine-tuning on downstream datasets.
We propose a set of general guidelines for successful pruning, even on the challenging SMC benchmark.
arXiv Detail & Related papers (2023-12-21T03:11:30Z) - Learning in Imperfect Environment: Multi-Label Classification with
Long-Tailed Distribution and Partial Labels [53.68653940062605]
We introduce a novel task, Partial labeling and Long-Tailed Multi-Label Classification (PLT-MLC)
We find that most LT-MLC and PL-MLC approaches fail to solve the degradation-MLC.
We propose an end-to-end learning framework: textbfCOrrection $rightarrow$ textbfModificattextbfIon $rightarrow$ balantextbfCe.
arXiv Detail & Related papers (2023-04-20T20:05:08Z) - $k$NN Prompting: Beyond-Context Learning with Calibration-Free Nearest
Neighbor Inference [75.08572535009276]
In-Context Learning (ICL) formulates target tasks as prompt completion conditioned on in-context demonstrations.
$k$NN Prompting first queries LLM with training data for distributed representations, then predicts test instances by simply referring to nearest neighbors.
It significantly outperforms state-of-the-art calibration-based methods under comparable few-shot scenario.
arXiv Detail & Related papers (2023-03-24T06:16:29Z) - CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances.
We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data.
Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z) - Meta-Generating Deep Attentive Metric for Few-shot Classification [53.07108067253006]
We present a novel deep metric meta-generation method to generate a specific metric for a new few-shot learning task.
In this study, we structure the metric using a three-layer deep attentive network that is flexible enough to produce a discriminative metric for each task.
We gain surprisingly obvious performance improvement over state-of-the-art competitors, especially in the challenging cases.
arXiv Detail & Related papers (2020-12-03T02:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.