Parallel Structures in Pre-training Data Yield In-Context Learning
- URL: http://arxiv.org/abs/2402.12530v1
- Date: Mon, 19 Feb 2024 20:40:48 GMT
- Title: Parallel Structures in Pre-training Data Yield In-Context Learning
- Authors: Yanda Chen, Chen Zhao, Zhou Yu, Kathleen McKeown, He He
- Abstract summary: We study what patterns of the pre-training data contribute to in-context learning (ICL)
We find that LMs' ICL ability depends on $textitparallel structures$ in the pre-training data.
- Score: 41.27837171531926
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained language models (LMs) are capable of in-context learning (ICL):
they can adapt to a task with only a few examples given in the prompt without
any parameter update. However, it is unclear where this capability comes from
as there is a stark distribution shift between pre-training text and ICL
prompts. In this work, we study what patterns of the pre-training data
contribute to ICL. We find that LMs' ICL ability depends on $\textit{parallel
structures}$ in the pre-training data -- pairs of phrases following similar
templates in the same context window. Specifically, we detect parallel
structures by checking whether training on one phrase improves prediction of
the other, and conduct ablation experiments to study their effect on ICL. We
show that removing parallel structures in the pre-training data reduces LMs'
ICL accuracy by 51% (vs 2% from random ablation). This drop persists even when
excluding common patterns such as n-gram repetitions and long-range dependency,
showing the diversity and generality of parallel structures. A closer look at
the detected parallel structures indicates that they cover diverse linguistic
tasks and span long distances in the data.
Related papers
- Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods [69.36397993451742]
This work introduces Context-aware Prompt Tuning (CPT), a method inspired by ICL, PT, and adversarial attacks.
We modify specific context tokens, considering the unique structure of input and output formats.
Inspired by adversarial attacks, we adjust the input based on the labels present in the context, focusing on minimizing, rather than maximizing, the loss.
arXiv Detail & Related papers (2024-10-22T17:45:47Z) - Deeper Insights Without Updates: The Power of In-Context Learning Over Fine-Tuning [22.341935761925892]
Fine-tuning and in-context learning (ICL) are two prevalent methods in imbuing large language models with task-specific knowledge.
This paper presents a counterintuitive finding: For tasks with implicit patterns, ICL captures these patterns significantly better than fine-tuning.
arXiv Detail & Related papers (2024-10-07T02:12:22Z) - From Unstructured Data to In-Context Learning: Exploring What Tasks Can Be Learned and When [19.841163050181194]
Large language models (LLMs) like transformers demonstrate impressive in-context learning (ICL) capabilities.
We examine what enables ICL in models trained on unstructured data, focusing on critical sequence model requirements and training data structure.
We find that many ICL capabilities can emerge simply from co-occurrence of semantically related word pairs in unstructured data.
We identify two cases where ICL fails: one in logic reasoning tasks that require generalizing to new, unseen patterns, and another in analogy completion where relevant word pairs appear only in fixed training positions.
arXiv Detail & Related papers (2024-05-31T18:46:06Z) - Understanding In-Context Learning via Supportive Pretraining Data [55.648777340129364]
In-context learning (ICL) improves language models' performance on a variety of NLP tasks by simply demonstrating a handful of examples at inference time.
It is not well understood why ICL ability emerges, as the model has never been specifically trained on such demonstrations.
Our work takes a first step towards understanding ICL via analyzing instance-level pretraining data.
arXiv Detail & Related papers (2023-06-26T22:14:04Z) - Explaining Emergent In-Context Learning as Kernel Regression [61.57151500616111]
Large language models (LLMs) have initiated a paradigm shift in transfer learning.
In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training.
We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z) - Data Curation Alone Can Stabilize In-context Learning [20.874674130060388]
In-context learning (ICL) enables large language models to perform new tasks by prompting them with a sequence of training examples.
randomly sampling examples from a training set leads to high variance in performance.
We show that carefully curating a subset of training data greatly stabilizes ICL performance without any other changes to the ICL algorithm.
arXiv Detail & Related papers (2022-12-20T15:58:54Z) - An Explanation of In-context Learning as Implicit Bayesian Inference [117.19809377740188]
We study the role of the pretraining distribution on the emergence of in-context learning.
We prove that in-context learning occurs implicitly via Bayesian inference of the latent concept.
We empirically find that scaling model size improves in-context accuracy even when the pretraining loss is the same.
arXiv Detail & Related papers (2021-11-03T09:12:33Z) - On the Transferability of Pre-trained Language Models: A Study from
Artificial Datasets [74.11825654535895]
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to achieve exceptional downstream performance.
We study what specific traits in the pre-training data, other than the semantics, make a pre-trained LM superior to their counterparts trained from scratch on downstream tasks.
arXiv Detail & Related papers (2021-09-08T10:39:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.