The Learnability of In-Context Learning
- URL: http://arxiv.org/abs/2303.07895v1
- Date: Tue, 14 Mar 2023 13:28:39 GMT
- Title: The Learnability of In-Context Learning
- Authors: Noam Wies, Yoav Levine, Amnon Shashua
- Abstract summary: We propose a first-of-its-kind PAC based framework for in-context learnability.
Our framework includes an initial pretraining phase, which fits a function to the pretraining distribution.
We show that in-context learning is more about identifying the task than about learning it.
- Score: 16.182561312622315
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In-context learning is a surprising and important phenomenon that emerged
when modern language models were scaled to billions of learned parameters.
Without modifying a large language model's weights, it can be tuned to perform
various downstream natural language tasks simply by including concatenated
training examples of these tasks in its input. Though disruptive for many
practical applications of large language models, this emergent learning
paradigm is not well understood from a theoretical perspective. In this paper,
we propose a first-of-its-kind PAC based framework for in-context learnability,
and use it to provide the first finite sample complexity results for the
in-context learning setup. Our framework includes an initial pretraining phase,
which fits a function to the pretraining distribution, and then a second
in-context learning phase, which keeps this function constant and concatenates
training examples of the downstream task in its input. We use our framework in
order to prove that, under mild assumptions, when the pretraining distribution
is a mixture of latent tasks (a model often considered for natural language
pretraining), these tasks can be efficiently learned via in-context learning,
even though the model's weights are unchanged and the input significantly
diverges from the pretraining distribution. Our theoretical analysis reveals
that in this setting, in-context learning is more about identifying the task
than about learning it, a result which is in line with a series of recent
empirical findings. We hope that the in-context learnability framework
presented in this paper will facilitate future progress towards a deeper
understanding of this important new learning paradigm.
Related papers
- In-context Learning in Presence of Spurious Correlations [8.055478206164105]
We study the possibility of training an in-context learner for classification tasks involving spurious features.
We find that the conventional approach of training in-context learners is susceptible to spurious features.
We propose a novel technique to train such a learner for a given classification task.
arXiv Detail & Related papers (2024-10-04T04:26:36Z) - RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models [57.12888828853409]
RAVEN is a model that combines retrieval-augmented masked language modeling and prefix language modeling.
Fusion-in-Context Learning enables the model to leverage more in-context examples without requiring additional training.
Our work underscores the potential of retrieval-augmented encoder-decoder language models for in-context learning.
arXiv Detail & Related papers (2023-08-15T17:59:18Z) - SINC: Self-Supervised In-Context Learning for Vision-Language Tasks [64.44336003123102]
We propose a framework to enable in-context learning in large language models.
A meta-model can learn on self-supervised prompts consisting of tailored demonstrations.
Experiments show that SINC outperforms gradient-based methods in various vision-language tasks.
arXiv Detail & Related papers (2023-07-15T08:33:08Z) - Pre-Training to Learn in Context [138.0745138788142]
The ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context.
We propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability.
Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters.
arXiv Detail & Related papers (2023-05-16T03:38:06Z) - A Theory of Emergent In-Context Learning as Implicit Structure Induction [8.17811111226145]
Scaling large language models leads to an emergent capacity to learn in-context from example demonstrations.
We argue that in-context learning relies on recombination of compositional operations found in natural language data.
We show how in-context learning is supported by a representation of the input's compositional structure.
arXiv Detail & Related papers (2023-03-14T15:24:05Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Comparison and Analysis of New Curriculum Criteria for End-to-End ASR [10.698093106994804]
Curriculum Learning is built on the observation that organized and structured assimilation of knowledge has the ability to enable faster training and better comprehension.
We employ Curriculum Learning in the context of Automatic Speech Recognition.
To impose structure on the training set, we explored multiple scoring functions that either use feedback from an external neural network or incorporate feedback from the model itself.
arXiv Detail & Related papers (2022-08-10T06:56:58Z) - An Explanation of In-context Learning as Implicit Bayesian Inference [117.19809377740188]
We study the role of the pretraining distribution on the emergence of in-context learning.
We prove that in-context learning occurs implicitly via Bayesian inference of the latent concept.
We empirically find that scaling model size improves in-context accuracy even when the pretraining loss is the same.
arXiv Detail & Related papers (2021-11-03T09:12:33Z) - Importance Weighted Policy Learning and Adaptation [89.46467771037054]
We study a complementary approach which is conceptually simple, general, modular and built on top of recent improvements in off-policy learning.
The framework is inspired by ideas from the probabilistic inference literature and combines robust off-policy learning with a behavior prior.
Our approach achieves competitive adaptation performance on hold-out tasks compared to meta reinforcement learning baselines and can scale to complex sparse-reward scenarios.
arXiv Detail & Related papers (2020-09-10T14:16:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.