Understanding In-Context Learning via Supportive Pretraining Data
- URL: http://arxiv.org/abs/2306.15091v1
- Date: Mon, 26 Jun 2023 22:14:04 GMT
- Title: Understanding In-Context Learning via Supportive Pretraining Data
- Authors: Xiaochuang Han, Daniel Simig, Todor Mihaylov, Yulia Tsvetkov, Asli
Celikyilmaz, Tianlu Wang
- Abstract summary: In-context learning (ICL) improves language models' performance on a variety of NLP tasks by simply demonstrating a handful of examples at inference time.
It is not well understood why ICL ability emerges, as the model has never been specifically trained on such demonstrations.
Our work takes a first step towards understanding ICL via analyzing instance-level pretraining data.
- Score: 55.648777340129364
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In-context learning (ICL) improves language models' performance on a variety
of NLP tasks by simply demonstrating a handful of examples at inference time.
It is not well understood why ICL ability emerges, as the model has never been
specifically trained on such demonstrations. Unlike prior work that explores
implicit mechanisms behind ICL, we study ICL via investigating the pretraining
data. Specifically, we first adapt an iterative, gradient-based approach to
find a small subset of pretraining data that supports ICL. We observe that a
continued pretraining on this small subset significantly improves the model's
ICL ability, by up to 18%. We then compare the supportive subset constrastively
with random subsets of pretraining data and discover: (1) The supportive
pretraining data to ICL do not have a higher domain relevance to downstream
tasks. (2) The supportive pretraining data have a higher mass of rarely
occurring, long-tail tokens. (3) The supportive pretraining data are
challenging examples where the information gain from long-range context is
below average, indicating learning to incorporate difficult long-range context
encourages ICL. Our work takes a first step towards understanding ICL via
analyzing instance-level pretraining data. Our insights have a potential to
enhance the ICL ability of language models by actively guiding the construction
of pretraining data in the future.
Related papers
- ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods [56.073335779595475]
We propose ReCaLL (Relative Conditional Log-Likelihood), a novel membership inference attack (MIA)
ReCaLL examines the relative change in conditional log-likelihoods when prefixing target data points with non-member context.
We conduct comprehensive experiments and show that ReCaLL achieves state-of-the-art performance on the WikiMIA dataset.
arXiv Detail & Related papers (2024-06-23T00:23:13Z) - Investigating the Pre-Training Dynamics of In-Context Learning: Task Recognition vs. Task Learning [99.05401042153214]
In-context learning (ICL) is potentially attributed to two major abilities: task recognition (TR) and task learning (TL)
We take the first step by examining the pre-training dynamics of the emergence of ICL.
We propose a simple yet effective method to better integrate these two abilities for ICL at inference time.
arXiv Detail & Related papers (2024-06-20T06:37:47Z) - From Unstructured Data to In-Context Learning: Exploring What Tasks Can Be Learned and When [19.841163050181194]
Large language models (LLMs) like transformers demonstrate impressive in-context learning (ICL) capabilities.
We examine what enables ICL in models trained on unstructured data, focusing on critical sequence model requirements and training data structure.
We find that many ICL capabilities can emerge simply from co-occurrence of semantically related word pairs in unstructured data.
We identify two cases where ICL fails: one in logic reasoning tasks that require generalizing to new, unseen patterns, and another in analogy completion where relevant word pairs appear only in fixed training positions.
arXiv Detail & Related papers (2024-05-31T18:46:06Z) - Parallel Structures in Pre-training Data Yield In-Context Learning [41.27837171531926]
We study what patterns of the pre-training data contribute to in-context learning (ICL)
We find that LMs' ICL ability depends on $textitparallel structures$ in the pre-training data.
arXiv Detail & Related papers (2024-02-19T20:40:48Z) - DAIL: Data Augmentation for In-Context Learning via Self-Paraphrase [37.68804898063595]
In-Context Learning (ICL) combined with pre-trained large language models has achieved promising results on various NLP tasks.
We propose textbfData textbfAugmentation for textbfIn-Context textbfLearning (textbfDAIL)
arXiv Detail & Related papers (2023-11-06T18:12:55Z) - Foundational Models for Continual Learning: An Empirical Study of Latent
Replay [17.322679682451597]
We study the efficacy of pre-trained vision models as a foundation for downstream continual learning scenarios.
We compare efficacy of various pre-trained models in large-scale benchmarking scenarios with a vanilla replay setting applied in the latent and in the raw-data space.
arXiv Detail & Related papers (2022-04-30T19:11:37Z) - Bridging the Gap between Language Models and Cross-Lingual Sequence
Labeling [101.74165219364264]
Large-scale cross-lingual pre-trained language models (xPLMs) have shown effectiveness in cross-lingual sequence labeling tasks.
Despite the great success, we draw an empirical observation that there is a training objective gap between pre-training and fine-tuning stages.
In this paper, we first design a pre-training task tailored for xSL named Cross-lingual Language Informative Span Masking (CLISM) to eliminate the objective gap.
Second, we present ContrAstive-Consistency Regularization (CACR), which utilizes contrastive learning to encourage the consistency between representations of input parallel
arXiv Detail & Related papers (2022-04-11T15:55:20Z) - From Good to Best: Two-Stage Training for Cross-lingual Machine Reading
Comprehension [51.953428342923885]
We develop a two-stage approach to enhance the model performance.
The first stage targets at recall: we design a hard-learning (HL) algorithm to maximize the likelihood that the top-k predictions contain the accurate answer.
The second stage focuses on precision: an answer-aware contrastive learning mechanism is developed to learn the fine difference between the accurate answer and other candidates.
arXiv Detail & Related papers (2021-12-09T07:31:15Z) - On the Transferability of Pre-trained Language Models: A Study from
Artificial Datasets [74.11825654535895]
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to achieve exceptional downstream performance.
We study what specific traits in the pre-training data, other than the semantics, make a pre-trained LM superior to their counterparts trained from scratch on downstream tasks.
arXiv Detail & Related papers (2021-09-08T10:39:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.