Understanding In-Context Learning with a Pelican Soup Framework
- URL: http://arxiv.org/abs/2402.10424v1
- Date: Fri, 16 Feb 2024 03:20:14 GMT
- Title: Understanding In-Context Learning with a Pelican Soup Framework
- Authors: Ting-Rui Chiang, Dani Yogatama
- Abstract summary: We propose a theoretical framework to explain in-context learning for natural language processing.
Our results demonstrate the efficacy of our framework to explain in-context learning.
- Score: 27.144616560712493
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Many existing theoretical analyses of in-context learning for natural
language processing are based on latent variable models that leaves gaps
between theory and practice. We aim to close these gaps by proposing a
theoretical framework, the Pelican Soup Framework. In this framework, we
introduce (1) the notion of a common sense knowledge base, (2) a general
formalism for natural language classification tasks, and the notion of (3)
meaning association. Under this framework, we can establish a
$\mathcal{O}(1/T)$ loss bound for in-context learning, where $T$ is the number
of example-label pairs in the demonstration. Compared with previous works, our
bound reflects the effect of the choice of verbalizers and the effect of
instruction tuning. An additional notion of \textit{atom concepts} makes our
framework possible to explain the generalization to tasks unseen in the
language model training data. Finally, we propose a toy setup, Calcutec, and a
digit addition task that mimics types of distribution shifts a model needs to
overcome to perform in-context learning. We also experiment with GPT2-Large on
real-world NLP tasks. Our empirical results demonstrate the efficacy of our
framework to explain in-context learning.
Related papers
- Federated Learning from Vision-Language Foundation Models: Theoretical Analysis and Method [7.261306002808739]
We construct a theoretical analysis framework for prompt-based federated learning via feature learning theory.
Specifically, we monitor the evolution of signal learning and noise memorization in prompt-based federated learning.
We show that performance can be assessed by the ratio of task-relevant to task-irrelevant coefficients.
arXiv Detail & Related papers (2024-09-29T08:31:26Z) - What Do Language Models Learn in Context? The Structured Task Hypothesis [89.65045443150889]
Large language models (LLMs) learn a novel task from in-context examples presented in a demonstration, termed in-context learning (ICL)
One popular hypothesis explains ICL by task selection.
Another popular hypothesis is that ICL is a form of meta-learning, i.e., the models learn a learning algorithm at pre-training time and apply it to the demonstration.
arXiv Detail & Related papers (2024-06-06T16:15:34Z) - Ontology Completion with Natural Language Inference and Concept Embeddings: An Analysis [26.918368764004796]
We consider the problem of finding plausible knowledge that is missing from a given ontology, as a generalisation of the well-studied taxonomy expansion task.
One line of work treats this task as a Natural Language Inference (NLI) problem, relying on the knowledge captured by language models to identify the missing knowledge.
Another line of work uses concept embeddings to identify what different concepts have in common, taking inspiration from cognitive models for category based induction.
arXiv Detail & Related papers (2024-03-25T21:46:35Z) - Can Large Language Models Understand Context? [17.196362853457412]
This paper introduces a context understanding benchmark by adapting existing datasets to suit the evaluation of generative models.
Experimental results indicate that pre-trained dense models struggle with understanding more nuanced contextual features when compared to state-of-the-art fine-tuned models.
As LLM compression holds growing significance in both research and real-world applications, we assess the context understanding of quantized models under in-context-learning settings.
arXiv Detail & Related papers (2024-02-01T18:55:29Z) - Explanation-aware Soft Ensemble Empowers Large Language Model In-context
Learning [50.00090601424348]
Large language models (LLMs) have shown remarkable capabilities in various natural language understanding tasks.
We propose EASE, an Explanation-Aware Soft Ensemble framework to empower in-context learning with LLMs.
arXiv Detail & Related papers (2023-11-13T06:13:38Z) - Pre-Training to Learn in Context [138.0745138788142]
The ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context.
We propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability.
Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters.
arXiv Detail & Related papers (2023-05-16T03:38:06Z) - A Theory of Emergent In-Context Learning as Implicit Structure Induction [8.17811111226145]
Scaling large language models leads to an emergent capacity to learn in-context from example demonstrations.
We argue that in-context learning relies on recombination of compositional operations found in natural language data.
We show how in-context learning is supported by a representation of the input's compositional structure.
arXiv Detail & Related papers (2023-03-14T15:24:05Z) - The Learnability of In-Context Learning [16.182561312622315]
We propose a first-of-its-kind PAC based framework for in-context learnability.
Our framework includes an initial pretraining phase, which fits a function to the pretraining distribution.
We show that in-context learning is more about identifying the task than about learning it.
arXiv Detail & Related papers (2023-03-14T13:28:39Z) - Synergies between Disentanglement and Sparsity: Generalization and
Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations.
Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z) - Unifying Language Learning Paradigms [96.35981503087567]
We present a unified framework for pre-training models that are universally effective across datasets and setups.
We show how different pre-training objectives can be cast as one another and how interpolating between different objectives can be effective.
Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.
arXiv Detail & Related papers (2022-05-10T19:32:20Z) - elBERto: Self-supervised Commonsense Learning for Question Answering [131.51059870970616]
We propose a Self-supervised Bidirectional Representation Learning of Commonsense framework, which is compatible with off-the-shelf QA model architectures.
The framework comprises five self-supervised tasks to force the model to fully exploit the additional training signals from contexts containing rich commonsense.
elBERto achieves substantial improvements on out-of-paragraph and no-effect questions where simple lexical similarity comparison does not help.
arXiv Detail & Related papers (2022-03-17T16:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.