Related papers: Positional Information Matters for Invariant In-Context Learning: A Case Study of Simple Function Classes

Positional Information Matters for Invariant In-Context Learning: A Case Study of Simple Function Classes

URL: http://arxiv.org/abs/2311.18194v1
Date: Thu, 30 Nov 2023 02:26:55 GMT
Title: Positional Information Matters for Invariant In-Context Learning: A Case Study of Simple Function Classes
Authors: Yongqiang Chen, Binghui Xie, Kaiwen Zhou, Bo Han, Yatao Bian, James Cheng
Abstract summary: In-context learning (ICL) refers to the ability of a model to condition on a few in-context demonstrations to generate the answer for a new query input. Despite the impressive ICL ability of LLMs, ICL in LLMs is sensitive to input demonstrations and limited to short context lengths.
Score: 39.08988313527199
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: In-context learning (ICL) refers to the ability of a model to condition on a few in-context demonstrations (input-output examples of the underlying task) to generate the answer for a new query input, without updating parameters. Despite the impressive ICL ability of LLMs, it has also been found that ICL in LLMs is sensitive to input demonstrations and limited to short context lengths. To understand the limitations and principles for successful ICL, we conduct an investigation with ICL linear regression of transformers. We characterize several Out-of-Distribution (OOD) cases for ICL inspired by realistic LLM ICL failures and compare transformers with DeepSet, a simple yet powerful architecture for ICL. Surprisingly, DeepSet outperforms transformers across a variety of distribution shifts, implying that preserving permutation invariance symmetry to input demonstrations is crucial for OOD ICL. The phenomenon specifies a fundamental requirement by ICL, which we termed as ICL invariance. Nevertheless, the positional encodings in LLMs will break ICL invariance. To this end, we further evaluate transformers with identical positional encodings and find preserving ICL invariance in transformers achieves state-of-the-art performance across various ICL distribution shifts

Related papers

Rethinking Invariance in In-context Learning [31.27174483063626]
In-Context Learning (ICL) has emerged as a pivotal capability of auto-regressive large language models.<n>It is hindered by a notable sensitivity to the ordering of context examples regardless of their mutual independence.<n>In this work, we identify two crucial elements in the design of an invariant ICL algorithm.
arXiv Detail & Related papers (2025-05-08T06:59:14Z)
Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention Transformers [30.145669421100965]
In-Context Learning is a powerful emergent property of large language models. We show that, for linearized transformer networks, ICL can be made explicit and permanent through the inclusion of bias terms. Our algorithm (ICLCA) allows for exact conversion in an inexpensive manner.
arXiv Detail & Related papers (2024-06-05T01:47:40Z)
Implicit In-context Learning [37.0562059811099]
In-context Learning (ICL) empowers large language models to adapt to unseen tasks during inference by prefixing a few demonstration examples prior to test queries. We introduce Implicit In-context Learning (I2CL), an innovative paradigm that addresses the challenges associated with traditional ICL by absorbing demonstration examples within the activation space. I2CL achieves few-shot performance with zero-shot cost and exhibits robustness against the variation of demonstration examples.
arXiv Detail & Related papers (2024-05-23T14:57:52Z)
ParaICL: Towards Robust Parallel In-Context Learning [74.38022919598443]
Large language models (LLMs) have become the norm in natural language processing. Few-shot in-context learning (ICL) relies on the choice of few-shot demonstration examples. We propose a novel method named parallel in-context learning (ParaICL)
arXiv Detail & Related papers (2024-03-31T05:56:15Z)
Batch-ICL: Effective, Efficient, and Order-Agnostic In-Context Learning [27.729189318779603]
Batch-ICL is an effective, efficient, and order-agnostic inference algorithm for in-context learning. We show that Batch-ICL consistently outperforms most permutations of ICL examples. We also develop a novel variant of Batch-ICL featuring multiple "epochs" of meta-optimization.
arXiv Detail & Related papers (2024-01-12T09:31:17Z)
The Transient Nature of Emergent In-Context Learning in Transformers [28.256651019346023]
Transformer networks can exhibit a surprising capacity for in-context learning (ICL) despite not being explicitly trained for it. We show that the emergence of ICL during transformer training is, in fact, often transient. We find that ICL first emerges, then disappears and gives way to IWL, all while the training loss decreases.
arXiv Detail & Related papers (2023-11-14T18:03:20Z)
How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations [98.7450564309923]
This paper takes initial steps on understanding in-context learning (ICL) in more complex scenarios, by studying learning with representations. We construct synthetic in-context learning problems with a compositional structure, where the label depends on the input through a possibly complex but fixed representation function. We show theoretically the existence of transformers that approximately implement such algorithms with mild depth and size.
arXiv Detail & Related papers (2023-10-16T17:40:49Z)
Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection [88.23337313766353]
This work first provides a comprehensive statistical theory for transformers to perform ICL. We show that transformers can implement a broad class of standard machine learning algorithms in context. A emphsingle transformer can adaptively select different base ICL algorithms.
arXiv Detail & Related papers (2023-06-07T17:59:31Z)
What and How does In-Context Learning Learn? Bayesian Model Averaging, Parameterization, and Generalization [111.55277952086155]
We study In-Context Learning (ICL) by addressing several open questions. We show that, without updating the neural network parameters, ICL implicitly implements the Bayesian model averaging algorithm. We prove that the error of pretrained model is bounded by a sum of an approximation error and a generalization error.
arXiv Detail & Related papers (2023-05-30T21:23:47Z)
Explaining Emergent In-Context Learning as Kernel Regression [61.57151500616111]
Large language models (LLMs) have initiated a paradigm shift in transfer learning. In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training. We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.