A Mechanism for Sample-Efficient In-Context Learning for Sparse
Retrieval Tasks
- URL: http://arxiv.org/abs/2305.17040v1
- Date: Fri, 26 May 2023 15:49:43 GMT
- Title: A Mechanism for Sample-Efficient In-Context Learning for Sparse
Retrieval Tasks
- Authors: Jacob Abernethy, Alekh Agarwal, Teodor V. Marinov, Manfred K. Warmuth
- Abstract summary: We show how a transformer model is able to perform ICL under reasonable assumptions on the pre-training process and the downstream tasks.
We establish that this entire procedure is implementable using the transformer mechanism.
- Score: 29.764014766305174
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the phenomenon of \textit{in-context learning} (ICL) exhibited by
large language models, where they can adapt to a new learning task, given a
handful of labeled examples, without any explicit parameter optimization. Our
goal is to explain how a pre-trained transformer model is able to perform ICL
under reasonable assumptions on the pre-training process and the downstream
tasks. We posit a mechanism whereby a transformer can achieve the following:
(a) receive an i.i.d. sequence of examples which have been converted into a
prompt using potentially-ambiguous delimiters, (b) correctly segment the prompt
into examples and labels, (c) infer from the data a \textit{sparse linear
regressor} hypothesis, and finally (d) apply this hypothesis on the given test
example and return a predicted label. We establish that this entire procedure
is implementable using the transformer mechanism, and we give sample complexity
guarantees for this learning framework. Our empirical findings validate the
challenge of segmentation, and we show a correspondence between our posited
mechanisms and observed attention maps for step (c).
Related papers
- Strengthening Structural Inductive Biases by Pre-training to Perform Syntactic Transformations [75.14793516745374]
We propose to strengthen the structural inductive bias of a Transformer by intermediate pre-training.
Our experiments confirm that this helps with few-shot learning of syntactic tasks such as chunking.
Our analysis shows that the intermediate pre-training leads to attention heads that keep track of which syntactic transformation needs to be applied to which token.
arXiv Detail & Related papers (2024-07-05T14:29:44Z) - Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars [66.823588073584]
Large language models (LLMs) have shown impressive capabilities in real-world applications.
The quality of these exemplars in the prompt greatly impacts performance.
Existing methods fail to adequately account for the impact of exemplar ordering on the performance.
arXiv Detail & Related papers (2024-05-25T08:23:05Z) - 'One size doesn't fit all': Learning how many Examples to use for
In-Context Learning for Improved Text Classification [18.167541508658417]
In-context learning (ICL) uses a small number of labelled data instances as examples in the prompt.
We propose a novel methodology of dynamically adapting the number of examples as per the data.
Our experiments show that our AICL method results in improvement in text classification task on several standard datasets.
arXiv Detail & Related papers (2024-03-11T03:28:13Z) - In-Context Learning for MIMO Equalization Using Transformer-Based
Sequence Models [44.161789477821536]
Large pre-trained sequence models have the capacity to carry out in-context learning (ICL)
In ICL, a decision on a new input is made via a direct mapping of the input and of a few examples from the given task.
We demonstrate via numerical results that transformer-based ICL has a threshold behavior.
arXiv Detail & Related papers (2023-11-10T15:09:04Z) - How Do Transformers Learn In-Context Beyond Simple Functions? A Case
Study on Learning with Representations [98.7450564309923]
This paper takes initial steps on understanding in-context learning (ICL) in more complex scenarios, by studying learning with representations.
We construct synthetic in-context learning problems with a compositional structure, where the label depends on the input through a possibly complex but fixed representation function.
We show theoretically the existence of transformers that approximately implement such algorithms with mild depth and size.
arXiv Detail & Related papers (2023-10-16T17:40:49Z) - Trained Transformers Learn Linear Models In-Context [39.56636898650966]
Attention-based neural networks as transformers have demonstrated a remarkable ability to exhibit inattention learning (ICL)
We show that when transformer training over random instances of linear regression problems, these models' predictions mimic nonlinear of ordinary squares.
arXiv Detail & Related papers (2023-06-16T15:50:03Z) - Transformers as Statisticians: Provable In-Context Learning with
In-Context Algorithm Selection [88.23337313766353]
This work first provides a comprehensive statistical theory for transformers to perform ICL.
We show that transformers can implement a broad class of standard machine learning algorithms in context.
A emphsingle transformer can adaptively select different base ICL algorithms.
arXiv Detail & Related papers (2023-06-07T17:59:31Z) - What and How does In-Context Learning Learn? Bayesian Model Averaging,
Parameterization, and Generalization [111.55277952086155]
We study In-Context Learning (ICL) by addressing several open questions.
We show that, without updating the neural network parameters, ICL implicitly implements the Bayesian model averaging algorithm.
We prove that the error of pretrained model is bounded by a sum of an approximation error and a generalization error.
arXiv Detail & Related papers (2023-05-30T21:23:47Z) - Explaining Emergent In-Context Learning as Kernel Regression [61.57151500616111]
Large language models (LLMs) have initiated a paradigm shift in transfer learning.
In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training.
We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z) - A CTC Alignment-based Non-autoregressive Transformer for End-to-end
Automatic Speech Recognition [26.79184118279807]
We present a CTC Alignment-based Single-Step Non-Autoregressive Transformer (CASS-NAT) for end-to-end ASR.
word embeddings in the autoregressive transformer (AT) are substituted with token-level acoustic embeddings (TAE) that are extracted from encoder outputs.
We find that CASS-NAT has a WER that is close to AT on various ASR tasks, while providing a 24x inference speedup.
arXiv Detail & Related papers (2023-04-15T18:34:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.