Data Curation Alone Can Stabilize In-context Learning
- URL: http://arxiv.org/abs/2212.10378v2
- Date: Wed, 24 May 2023 22:32:56 GMT
- Title: Data Curation Alone Can Stabilize In-context Learning
- Authors: Ting-Yun Chang and Robin Jia
- Abstract summary: In-context learning (ICL) enables large language models to perform new tasks by prompting them with a sequence of training examples.
randomly sampling examples from a training set leads to high variance in performance.
We show that carefully curating a subset of training data greatly stabilizes ICL performance without any other changes to the ICL algorithm.
- Score: 20.874674130060388
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In-context learning (ICL) enables large language models (LLMs) to perform new
tasks by prompting them with a sequence of training examples. However, it is
known that ICL is very sensitive to the choice of training examples: randomly
sampling examples from a training set leads to high variance in performance. In
this paper, we show that carefully curating a subset of training data greatly
stabilizes ICL performance without any other changes to the ICL algorithm
(e.g., prompt retrieval or calibration). We introduce two methods to choose
training subsets -- both score training examples individually, then select the
highest-scoring ones. CondAcc scores a training example by its average dev-set
ICL accuracy when combined with random training examples, while Datamodels
learns linear regressors that estimate how the presence of each training
example influences LLM outputs. Across five tasks and two LLMs, sampling from
stable subsets selected by CondAcc and Datamodels improves average accuracy
over sampling from the entire training set by 7.7% and 6.3%, respectively.
Surprisingly, the stable subset examples are not especially diverse in content
or low in perplexity, in contrast with other work suggesting that diversity and
perplexity are important when prompting LLMs.
Related papers
- One size doesn't fit all: Predicting the Number of Examples for In-Context Learning [16.712595387955574]
In-context learning (ICL) refers to the process of adding a small number of localized examples (ones that are semantically similar to the input) to a training set of labelled data.
Our work alleviates the limitations of this 'one fits all' approach by dynamically predicting the number of examples for each data instance to be used in few-shot inference.
Our experiments on a number of text classification benchmarks show that AICL substantially outperforms standard ICL by up to 17%.
arXiv Detail & Related papers (2024-03-11T03:28:13Z) - Balanced Data Sampling for Language Model Training with Clustering [96.46042695333655]
We propose ClusterClip Sampling to balance the text distribution of training data for better model training.
Extensive experiments validate the effectiveness of ClusterClip Sampling.
arXiv Detail & Related papers (2024-02-22T13:20:53Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - How to Train Data-Efficient LLMs [56.41105687693619]
We study data-efficient approaches for pre-training language models (LLMs)
We find that Ask-LLM and Density sampling are the best methods in their respective categories.
In our comparison of 19 samplers, involving hundreds of evaluation tasks and pre-training runs, we find that Ask-LLM and Density are the best methods in their respective categories.
arXiv Detail & Related papers (2024-02-15T02:27:57Z) - GistScore: Learning Better Representations for In-Context Example
Selection with Gist Bottlenecks [3.9638110494107095]
In-context Learning (ICL) is the ability of Large Language Models (LLMs) to perform new tasks when conditioned on prompts.
We propose Example Gisting, a novel approach for training example encoders through supervised fine-tuning.
We show that our fine-tuned models get state-of-the-art ICL performance with over 20% absolute gain over off-the-shelf retrievers.
arXiv Detail & Related papers (2023-11-16T06:28:05Z) - Which Examples to Annotate for In-Context Learning? Towards Effective
and Efficient Selection [35.924633625147365]
Large Language Models (LLMs) can adapt to new tasks via in-context learning (ICL)
In this work, we investigate an active learning approach for ICL, where there is a limited budget for annotating examples.
We propose a model-adaptive optimization-free algorithm, termed AdaICL, which identifies examples that the model is uncertain about.
arXiv Detail & Related papers (2023-10-30T22:03:55Z) - How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression? [92.90857135952231]
Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL) capabilities.
We study ICL in one of its simplest setups: pretraining a linearly parameterized single-layer linear attention model for linear regression.
arXiv Detail & Related papers (2023-10-12T15:01:43Z) - Understanding In-Context Learning via Supportive Pretraining Data [55.648777340129364]
In-context learning (ICL) improves language models' performance on a variety of NLP tasks by simply demonstrating a handful of examples at inference time.
It is not well understood why ICL ability emerges, as the model has never been specifically trained on such demonstrations.
Our work takes a first step towards understanding ICL via analyzing instance-level pretraining data.
arXiv Detail & Related papers (2023-06-26T22:14:04Z) - Estimating Large Language Model Capabilities without Labeled Test Data [51.428562302037534]
Large Language Models (LLMs) have the impressive ability to perform in-context learning (ICL) from only a few examples.
We propose the task of ICL accuracy estimation, in which we predict the accuracy of an LLM when doing in-context learning on a new task.
arXiv Detail & Related papers (2023-05-24T06:55:09Z) - Training Dynamics for Curriculum Learning: A Study on Monolingual and
Cross-lingual NLU [19.42920238320109]
Curriculum Learning (CL) is a technique of training models via ranking examples in a typically increasing difficulty trend.
In this work, we employ CL for Natural Language Understanding (NLU) tasks by taking advantage of training dynamics as difficulty metrics.
Experiments indicate that training dynamics can lead to better performing models with smoother training compared to other difficulty metrics.
arXiv Detail & Related papers (2022-10-22T17:10:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.