Schema-learning and rebinding as mechanisms of in-context learning and
emergence
- URL: http://arxiv.org/abs/2307.01201v1
- Date: Fri, 16 Jun 2023 00:29:19 GMT
- Title: Schema-learning and rebinding as mechanisms of in-context learning and
emergence
- Authors: Sivaramakrishnan Swaminathan, Antoine Dedieu, Rajkumar Vasudeva Raju,
Murray Shanahan, Miguel Lazaro-Gredilla, Dileep George
- Abstract summary: In-context learning (ICL) is one of the most powerful and most unexpected capabilities to emerge in recent transformer-based large language models (LLMs)
We demonstrate that comparable ICL capabilities can be acquired by an alternative sequence prediction learning method using clone-structured causal graphs (CSCGs)
- Score: 10.370506005311091
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In-context learning (ICL) is one of the most powerful and most unexpected
capabilities to emerge in recent transformer-based large language models
(LLMs). Yet the mechanisms that underlie it are poorly understood. In this
paper, we demonstrate that comparable ICL capabilities can be acquired by an
alternative sequence prediction learning method using clone-structured causal
graphs (CSCGs). Moreover, a key property of CSCGs is that, unlike
transformer-based LLMs, they are {\em interpretable}, which considerably
simplifies the task of explaining how ICL works. Specifically, we show that it
uses a combination of (a) learning template (schema) circuits for pattern
completion, (b) retrieving relevant templates in a context-sensitive manner,
and (c) rebinding of novel tokens to appropriate slots in the templates. We go
on to marshall evidence for the hypothesis that similar mechanisms underlie ICL
in LLMs. For example, we find that, with CSCGs as with LLMs, different
capabilities emerge at different levels of overparameterization, suggesting
that overparameterization helps in learning more complex template (schema)
circuits. By showing how ICL can be achieved with small models and datasets, we
open up a path to novel architectures, and take a vital step towards a more
general understanding of the mechanics behind this important capability.
Related papers
- Enhancing LLM's Cognition via Structurization [41.13997892843677]
Large language models (LLMs) process input contexts through a causal and sequential perspective.
This paper presents a novel concept of context structurization.
Specifically, we transform the plain, unordered contextual sentences into well-ordered and hierarchically structurized elements.
arXiv Detail & Related papers (2024-07-23T12:33:58Z) - Identifiable Causal Representation Learning: Unsupervised, Multi-View, and Multi-Environment [10.814585613336778]
Causal representation learning aims to combine the core strengths of machine learning and causality.
This thesis investigates what is possible for CRL without direct supervision, and thus contributes to its theoretical foundations.
arXiv Detail & Related papers (2024-06-19T09:14:40Z) - Towards More Unified In-context Visual Understanding [74.55332581979292]
We present a new ICL framework for visual understanding with multi-modal output enabled.
First, we quantize and embed both text and visual prompt into a unified representational space.
Then a decoder-only sparse transformer architecture is employed to perform generative modeling on them.
arXiv Detail & Related papers (2023-12-05T06:02:21Z) - In-Context Exemplars as Clues to Retrieving from Large Associative
Memory [1.2952137350423816]
In-context learning (ICL) enables large language models (LLMs) to learn patterns from in-context exemplars without training.
How to choose exemplars remains unclear due to the lack of understanding of how in-context learning works.
Our study sheds new light on the mechanism of ICL by connecting it to memory retrieval.
arXiv Detail & Related papers (2023-11-06T20:13:29Z) - In-context Learning with Transformer Is Really Equivalent to a
Contrastive Learning Pattern [11.329953476499712]
In this paper, we interpret the inference process of ICL as a gradient descent process in a contrastive learning pattern.
To the best of our knowledge, our work is the first to provide the understanding of ICL from the perspective of contrastive learning.
arXiv Detail & Related papers (2023-10-20T01:55:34Z) - How Do Transformers Learn In-Context Beyond Simple Functions? A Case
Study on Learning with Representations [98.7450564309923]
This paper takes initial steps on understanding in-context learning (ICL) in more complex scenarios, by studying learning with representations.
We construct synthetic in-context learning problems with a compositional structure, where the label depends on the input through a possibly complex but fixed representation function.
We show theoretically the existence of transformers that approximately implement such algorithms with mild depth and size.
arXiv Detail & Related papers (2023-10-16T17:40:49Z) - Transformers as Statisticians: Provable In-Context Learning with
In-Context Algorithm Selection [88.23337313766353]
This work first provides a comprehensive statistical theory for transformers to perform ICL.
We show that transformers can implement a broad class of standard machine learning algorithms in context.
A emphsingle transformer can adaptively select different base ICL algorithms.
arXiv Detail & Related papers (2023-06-07T17:59:31Z) - Explaining Emergent In-Context Learning as Kernel Regression [61.57151500616111]
Large language models (LLMs) have initiated a paradigm shift in transfer learning.
In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training.
We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z) - Learning Multi-Objective Curricula for Deep Reinforcement Learning [55.27879754113767]
Various automatic curriculum learning (ACL) methods have been proposed to improve the sample efficiency and final performance of deep reinforcement learning (DRL)
In this paper, we propose a unified automatic curriculum learning framework to create multi-objective but coherent curricula.
In addition to existing hand-designed curricula paradigms, we further design a flexible memory mechanism to learn an abstract curriculum.
arXiv Detail & Related papers (2021-10-06T19:30:25Z) - Prototypical Contrastive Learning of Unsupervised Representations [171.3046900127166]
Prototypical Contrastive Learning (PCL) is an unsupervised representation learning method.
PCL implicitly encodes semantic structures of the data into the learned embedding space.
PCL outperforms state-of-the-art instance-wise contrastive learning methods on multiple benchmarks.
arXiv Detail & Related papers (2020-05-11T09:53:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.