Positional Information Matters for Invariant In-Context Learning: A Case
Study of Simple Function Classes
- URL: http://arxiv.org/abs/2311.18194v1
- Date: Thu, 30 Nov 2023 02:26:55 GMT
- Title: Positional Information Matters for Invariant In-Context Learning: A Case
Study of Simple Function Classes
- Authors: Yongqiang Chen, Binghui Xie, Kaiwen Zhou, Bo Han, Yatao Bian, James
Cheng
- Abstract summary: In-context learning (ICL) refers to the ability of a model to condition on a few in-context demonstrations to generate the answer for a new query input.
Despite the impressive ICL ability of LLMs, ICL in LLMs is sensitive to input demonstrations and limited to short context lengths.
- Score: 39.08988313527199
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In-context learning (ICL) refers to the ability of a model to condition on a
few in-context demonstrations (input-output examples of the underlying task) to
generate the answer for a new query input, without updating parameters. Despite
the impressive ICL ability of LLMs, it has also been found that ICL in LLMs is
sensitive to input demonstrations and limited to short context lengths. To
understand the limitations and principles for successful ICL, we conduct an
investigation with ICL linear regression of transformers. We characterize
several Out-of-Distribution (OOD) cases for ICL inspired by realistic LLM ICL
failures and compare transformers with DeepSet, a simple yet powerful
architecture for ICL. Surprisingly, DeepSet outperforms transformers across a
variety of distribution shifts, implying that preserving permutation invariance
symmetry to input demonstrations is crucial for OOD ICL. The phenomenon
specifies a fundamental requirement by ICL, which we termed as ICL invariance.
Nevertheless, the positional encodings in LLMs will break ICL invariance. To
this end, we further evaluate transformers with identical positional encodings
and find preserving ICL invariance in transformers achieves state-of-the-art
performance across various ICL distribution shifts
Related papers
- Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention Transformers [30.145669421100965]
In-Context Learning is a powerful emergent property of large language models.
We show that, for linearized transformer networks, ICL can be made explicit and permanent through the inclusion of bias terms.
Our algorithm (ICLCA) allows for exact conversion in an inexpensive manner.
arXiv Detail & Related papers (2024-06-05T01:47:40Z) - Implicit In-context Learning [37.0562059811099]
In-context Learning (ICL) empowers large language models to adapt to unseen tasks during inference by prefixing a few demonstration examples prior to test queries.
We introduce Implicit In-context Learning (I2CL), an innovative paradigm that addresses the challenges associated with traditional ICL by absorbing demonstration examples within the activation space.
I2CL achieves few-shot performance with zero-shot cost and exhibits robustness against the variation of demonstration examples.
arXiv Detail & Related papers (2024-05-23T14:57:52Z) - ParaICL: Towards Robust Parallel In-Context Learning [74.38022919598443]
Large language models (LLMs) have become the norm in natural language processing.
Few-shot in-context learning (ICL) relies on the choice of few-shot demonstration examples.
We propose a novel method named parallel in-context learning (ParaICL)
arXiv Detail & Related papers (2024-03-31T05:56:15Z) - Batch-ICL: Effective, Efficient, and Order-Agnostic In-Context Learning [27.729189318779603]
Batch-ICL is an effective, efficient, and order-agnostic inference algorithm for in-context learning.
We show that Batch-ICL consistently outperforms most permutations of ICL examples.
We also develop a novel variant of Batch-ICL featuring multiple "epochs" of meta-optimization.
arXiv Detail & Related papers (2024-01-12T09:31:17Z) - The Transient Nature of Emergent In-Context Learning in Transformers [28.256651019346023]
Transformer networks can exhibit a surprising capacity for in-context learning (ICL) despite not being explicitly trained for it.
We show that the emergence of ICL during transformer training is, in fact, often transient.
We find that ICL first emerges, then disappears and gives way to IWL, all while the training loss decreases.
arXiv Detail & Related papers (2023-11-14T18:03:20Z) - How Do Transformers Learn In-Context Beyond Simple Functions? A Case
Study on Learning with Representations [98.7450564309923]
This paper takes initial steps on understanding in-context learning (ICL) in more complex scenarios, by studying learning with representations.
We construct synthetic in-context learning problems with a compositional structure, where the label depends on the input through a possibly complex but fixed representation function.
We show theoretically the existence of transformers that approximately implement such algorithms with mild depth and size.
arXiv Detail & Related papers (2023-10-16T17:40:49Z) - Transformers as Statisticians: Provable In-Context Learning with
In-Context Algorithm Selection [88.23337313766353]
This work first provides a comprehensive statistical theory for transformers to perform ICL.
We show that transformers can implement a broad class of standard machine learning algorithms in context.
A emphsingle transformer can adaptively select different base ICL algorithms.
arXiv Detail & Related papers (2023-06-07T17:59:31Z) - What and How does In-Context Learning Learn? Bayesian Model Averaging,
Parameterization, and Generalization [111.55277952086155]
We study In-Context Learning (ICL) by addressing several open questions.
We show that, without updating the neural network parameters, ICL implicitly implements the Bayesian model averaging algorithm.
We prove that the error of pretrained model is bounded by a sum of an approximation error and a generalization error.
arXiv Detail & Related papers (2023-05-30T21:23:47Z) - Explaining Emergent In-Context Learning as Kernel Regression [61.57151500616111]
Large language models (LLMs) have initiated a paradigm shift in transfer learning.
In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training.
We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.