Schema-learning and rebinding as mechanisms of in-context learning and
emergence
- URL: http://arxiv.org/abs/2307.01201v1
- Date: Fri, 16 Jun 2023 00:29:19 GMT
- Title: Schema-learning and rebinding as mechanisms of in-context learning and
emergence
- Authors: Sivaramakrishnan Swaminathan, Antoine Dedieu, Rajkumar Vasudeva Raju,
Murray Shanahan, Miguel Lazaro-Gredilla, Dileep George
- Abstract summary: In-context learning (ICL) is one of the most powerful and most unexpected capabilities to emerge in recent transformer-based large language models (LLMs)
We demonstrate that comparable ICL capabilities can be acquired by an alternative sequence prediction learning method using clone-structured causal graphs (CSCGs)
- Score: 10.370506005311091
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In-context learning (ICL) is one of the most powerful and most unexpected
capabilities to emerge in recent transformer-based large language models
(LLMs). Yet the mechanisms that underlie it are poorly understood. In this
paper, we demonstrate that comparable ICL capabilities can be acquired by an
alternative sequence prediction learning method using clone-structured causal
graphs (CSCGs). Moreover, a key property of CSCGs is that, unlike
transformer-based LLMs, they are {\em interpretable}, which considerably
simplifies the task of explaining how ICL works. Specifically, we show that it
uses a combination of (a) learning template (schema) circuits for pattern
completion, (b) retrieving relevant templates in a context-sensitive manner,
and (c) rebinding of novel tokens to appropriate slots in the templates. We go
on to marshall evidence for the hypothesis that similar mechanisms underlie ICL
in LLMs. For example, we find that, with CSCGs as with LLMs, different
capabilities emerge at different levels of overparameterization, suggesting
that overparameterization helps in learning more complex template (schema)
circuits. By showing how ICL can be achieved with small models and datasets, we
open up a path to novel architectures, and take a vital step towards a more
general understanding of the mechanics behind this important capability.
Related papers
- Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
In-Context Learning (ICL) and.
Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting.
LLMs to downstream tasks.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - Enhancing LLM's Cognition via Structurization [41.13997892843677]
Large language models (LLMs) process input contexts through a causal and sequential perspective.
This paper presents a novel concept of context structurization.
Specifically, we transform the plain, unordered contextual sentences into well-ordered and hierarchically structurized elements.
arXiv Detail & Related papers (2024-07-23T12:33:58Z) - Identifiable Causal Representation Learning: Unsupervised, Multi-View, and Multi-Environment [10.814585613336778]
Causal representation learning aims to combine the core strengths of machine learning and causality.
This thesis investigates what is possible for CRL without direct supervision, and thus contributes to its theoretical foundations.
arXiv Detail & Related papers (2024-06-19T09:14:40Z) - From Unstructured Data to In-Context Learning: Exploring What Tasks Can Be Learned and When [19.841163050181194]
Large language models (LLMs) like transformers demonstrate impressive in-context learning (ICL) capabilities.
We examine what enables ICL in models trained on unstructured data, focusing on critical sequence model requirements and training data structure.
We find that many ICL capabilities can emerge simply from co-occurrence of semantically related word pairs in unstructured data.
We identify two cases where ICL fails: one in logic reasoning tasks that require generalizing to new, unseen patterns, and another in analogy completion where relevant word pairs appear only in fixed training positions.
arXiv Detail & Related papers (2024-05-31T18:46:06Z) - Towards More Unified In-context Visual Understanding [74.55332581979292]
We present a new ICL framework for visual understanding with multi-modal output enabled.
First, we quantize and embed both text and visual prompt into a unified representational space.
Then a decoder-only sparse transformer architecture is employed to perform generative modeling on them.
arXiv Detail & Related papers (2023-12-05T06:02:21Z) - In-Context Exemplars as Clues to Retrieving from Large Associative
Memory [1.2952137350423816]
In-context learning (ICL) enables large language models (LLMs) to learn patterns from in-context exemplars without training.
How to choose exemplars remains unclear due to the lack of understanding of how in-context learning works.
Our study sheds new light on the mechanism of ICL by connecting it to memory retrieval.
arXiv Detail & Related papers (2023-11-06T20:13:29Z) - How Do Transformers Learn In-Context Beyond Simple Functions? A Case
Study on Learning with Representations [98.7450564309923]
This paper takes initial steps on understanding in-context learning (ICL) in more complex scenarios, by studying learning with representations.
We construct synthetic in-context learning problems with a compositional structure, where the label depends on the input through a possibly complex but fixed representation function.
We show theoretically the existence of transformers that approximately implement such algorithms with mild depth and size.
arXiv Detail & Related papers (2023-10-16T17:40:49Z) - Transformers as Statisticians: Provable In-Context Learning with
In-Context Algorithm Selection [88.23337313766353]
This work first provides a comprehensive statistical theory for transformers to perform ICL.
We show that transformers can implement a broad class of standard machine learning algorithms in context.
A emphsingle transformer can adaptively select different base ICL algorithms.
arXiv Detail & Related papers (2023-06-07T17:59:31Z) - Explaining Emergent In-Context Learning as Kernel Regression [61.57151500616111]
Large language models (LLMs) have initiated a paradigm shift in transfer learning.
In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training.
We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z) - Prototypical Contrastive Learning of Unsupervised Representations [171.3046900127166]
Prototypical Contrastive Learning (PCL) is an unsupervised representation learning method.
PCL implicitly encodes semantic structures of the data into the learned embedding space.
PCL outperforms state-of-the-art instance-wise contrastive learning methods on multiple benchmarks.
arXiv Detail & Related papers (2020-05-11T09:53:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.