Explaining Emergent In-Context Learning as Kernel Regression
- URL: http://arxiv.org/abs/2305.12766v2
- Date: Thu, 5 Oct 2023 16:04:43 GMT
- Title: Explaining Emergent In-Context Learning as Kernel Regression
- Authors: Chi Han, Ziqi Wang, Han Zhao, Heng Ji
- Abstract summary: Large language models (LLMs) have initiated a paradigm shift in transfer learning.
In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training.
We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
- Score: 61.57151500616111
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have initiated a paradigm shift in transfer
learning. In contrast to the classic pretraining-then-finetuning procedure, in
order to use LLMs for downstream prediction tasks, one only needs to provide a
few demonstrations, known as in-context examples, without adding more or
updating existing model parameters. This in-context learning (ICL) capability
of LLMs is intriguing, and it is not yet fully understood how pretrained LLMs
acquire such capabilities. In this paper, we investigate the reason why a
transformer-based language model can accomplish in-context learning after
pre-training on a general language corpus by proposing one hypothesis that LLMs
can simulate kernel regression with internal representations when faced with
in-context examples. More concretely, we first prove that Bayesian inference on
in-context prompts can be asymptotically understood as kernel regression $\hat
y = \sum_i y_i K(x, x_i)/\sum_i K(x, x_i)$ as the number of in-context
demonstrations grows. Then, we empirically investigate the in-context behaviors
of language models. We find that during ICL, the attention and hidden features
in LLMs match the behaviors of a kernel regression. Finally, our theory
provides insights into multiple phenomena observed in the ICL field: why
retrieving demonstrative samples similar to test samples can help, why ICL
performance is sensitive to the output formats, and why ICL accuracy benefits
from selecting in-distribution and representative samples.
Related papers
- Verbalized Machine Learning: Revisiting Machine Learning with Language Models [63.10391314749408]
We introduce the framework of verbalized machine learning (VML)
VML constrains the parameter space to be human-interpretable natural language.
We empirically verify the effectiveness of VML, and hope that VML can serve as a stepping stone to stronger interpretability.
arXiv Detail & Related papers (2024-06-06T17:59:56Z) - What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages [78.1866280652834]
Large language models (LM) are distributions over strings.
We investigate the learnability of regular LMs (RLMs) by RNN and Transformer LMs.
We find that the complexity of the RLM rank is strong and significant predictors of learnability for both RNNs and Transformers.
arXiv Detail & Related papers (2024-06-06T17:34:24Z) - What Do Language Models Learn in Context? The Structured Task Hypothesis [89.65045443150889]
Large language models (LLMs) learn a novel task from in-context examples presented in a demonstration, termed in-context learning (ICL)
One popular hypothesis explains ICL by task selection.
Another popular hypothesis is that ICL is a form of meta-learning, i.e., the models learn a learning algorithm at pre-training time and apply it to the demonstration.
arXiv Detail & Related papers (2024-06-06T16:15:34Z) - Implicit In-context Learning [37.0562059811099]
In-context Learning (ICL) empowers large language models to adapt to unseen tasks during inference by prefixing a few demonstration examples prior to test queries.
We introduce Implicit In-context Learning (I2CL), an innovative paradigm that addresses the challenges associated with traditional ICL by absorbing demonstration examples within the activation space.
I2CL achieves few-shot performance with zero-shot cost and exhibits robustness against the variation of demonstration examples.
arXiv Detail & Related papers (2024-05-23T14:57:52Z) - In-Context Exemplars as Clues to Retrieving from Large Associative
Memory [1.2952137350423816]
In-context learning (ICL) enables large language models (LLMs) to learn patterns from in-context exemplars without training.
How to choose exemplars remains unclear due to the lack of understanding of how in-context learning works.
Our study sheds new light on the mechanism of ICL by connecting it to memory retrieval.
arXiv Detail & Related papers (2023-11-06T20:13:29Z) - Do pretrained Transformers Learn In-Context by Gradient Descent? [21.23795112800977]
In this paper, we investigate the emergence of In-Context Learning (ICL) in language models pre-trained on natural data (LLaMa-7B)
We find that ICL and Gradient Descent (GD) modify the output distribution of language models differently.
These results indicate that emphthe equivalence between ICL and GD remains an open hypothesis and calls for further studies.
arXiv Detail & Related papers (2023-10-12T17:32:09Z) - What and How does In-Context Learning Learn? Bayesian Model Averaging,
Parameterization, and Generalization [111.55277952086155]
We study In-Context Learning (ICL) by addressing several open questions.
We show that, without updating the neural network parameters, ICL implicitly implements the Bayesian model averaging algorithm.
We prove that the error of pretrained model is bounded by a sum of an approximation error and a generalization error.
arXiv Detail & Related papers (2023-05-30T21:23:47Z) - A Theory of Emergent In-Context Learning as Implicit Structure Induction [8.17811111226145]
Scaling large language models leads to an emergent capacity to learn in-context from example demonstrations.
We argue that in-context learning relies on recombination of compositional operations found in natural language data.
We show how in-context learning is supported by a representation of the input's compositional structure.
arXiv Detail & Related papers (2023-03-14T15:24:05Z) - ThinkSum: Probabilistic reasoning over sets using large language models [18.123895485602244]
We propose a two-stage probabilistic inference paradigm, ThinkSum, which reasons over sets of objects or facts in a structured manner.
We demonstrate the possibilities and advantages of ThinkSum on the BIG-bench suite of LLM evaluation tasks.
arXiv Detail & Related papers (2022-10-04T00:34:01Z) - An Explanation of In-context Learning as Implicit Bayesian Inference [117.19809377740188]
We study the role of the pretraining distribution on the emergence of in-context learning.
We prove that in-context learning occurs implicitly via Bayesian inference of the latent concept.
We empirically find that scaling model size improves in-context accuracy even when the pretraining loss is the same.
arXiv Detail & Related papers (2021-11-03T09:12:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.