Linking In-context Learning in Transformers to Human Episodic Memory
- URL: http://arxiv.org/abs/2405.14992v2
- Date: Thu, 31 Oct 2024 04:15:40 GMT
- Title: Linking In-context Learning in Transformers to Human Episodic Memory
- Authors: Li Ji-An, Corey Y. Zhou, Marcus K. Benna, Marcelo G. Mattar,
- Abstract summary: We focus on induction heads, which contribute to in-context learning in Transformer-based large language models.
We demonstrate that induction heads are behaviorally, functionally, and mechanistically similar to the contextual maintenance and retrieval model of human episodic memory.
- Score: 1.124958340749622
- License:
- Abstract: Understanding connections between artificial and biological intelligent systems can reveal fundamental principles of general intelligence. While many artificial intelligence models have a neuroscience counterpart, such connections are largely missing in Transformer models and the self-attention mechanism. Here, we examine the relationship between interacting attention heads and human episodic memory. We focus on induction heads, which contribute to in-context learning in Transformer-based large language models (LLMs). We demonstrate that induction heads are behaviorally, functionally, and mechanistically similar to the contextual maintenance and retrieval (CMR) model of human episodic memory. Our analyses of LLMs pre-trained on extensive text data show that CMR-like heads often emerge in the intermediate and late layers, qualitatively mirroring human memory biases. The ablation of CMR-like heads suggests their causal role in in-context learning. Our findings uncover a parallel between the computational mechanisms of LLMs and human memory, offering valuable insights into both research fields.
Related papers
- Brain-like Functional Organization within Large Language Models [58.93629121400745]
The human brain has long inspired the pursuit of artificial intelligence (AI)
Recent neuroimaging studies provide compelling evidence of alignment between the computational representation of artificial neural networks (ANNs) and the neural responses of the human brain to stimuli.
In this study, we bridge this gap by directly coupling sub-groups of artificial neurons with functional brain networks (FBNs)
This framework links the AN sub-groups to FBNs, enabling the delineation of brain-like functional organization within large language models (LLMs)
arXiv Detail & Related papers (2024-10-25T13:15:17Z) - Contextual Feature Extraction Hierarchies Converge in Large Language
Models and the Brain [12.92793034617015]
We show that as large language models (LLMs) achieve higher performance on benchmark tasks, they become more brain-like.
We also show the importance of contextual information in improving model performance and brain similarity.
arXiv Detail & Related papers (2024-01-31T08:48:35Z) - A Language Model with Limited Memory Capacity Captures Interference in
Human Sentence Processing [25.916625483405802]
We develop a recurrent neural language model with a single self-attention head.
We show that our model's single attention head captures semantic and syntactic interference effects observed in human experiments.
arXiv Detail & Related papers (2023-10-24T19:33:27Z) - Unveiling Theory of Mind in Large Language Models: A Parallel to Single
Neurons in the Human Brain [2.5350521110810056]
Large language models (LLMs) have been found to exhibit a certain level of Theory of Mind (ToM)
The precise processes underlying LLM's capacity for ToM or their similarities with that of humans remains largely unknown.
arXiv Detail & Related papers (2023-09-04T15:26:15Z) - Language Knowledge-Assisted Representation Learning for Skeleton-Based
Action Recognition [71.35205097460124]
How humans understand and recognize the actions of others is a complex neuroscientific problem.
LA-GCN proposes a graph convolution network using large-scale language models (LLM) knowledge assistance.
arXiv Detail & Related papers (2023-05-21T08:29:16Z) - Machine Psychology [54.287802134327485]
We argue that a fruitful direction for research is engaging large language models in behavioral experiments inspired by psychology.
We highlight theoretical perspectives, experimental paradigms, and computational analysis techniques that this approach brings to the table.
It paves the way for a "machine psychology" for generative artificial intelligence (AI) that goes beyond performance benchmarks.
arXiv Detail & Related papers (2023-03-24T13:24:41Z) - Learning Theory of Mind via Dynamic Traits Attribution [59.9781556714202]
We propose a new neural ToM architecture that learns to generate a latent trait vector of an actor from the past trajectories.
This trait vector then multiplicatively modulates the prediction mechanism via a fast weights' scheme in the prediction neural network.
We empirically show that the fast weights provide a good inductive bias to model the character traits of agents and hence improves mindreading ability.
arXiv Detail & Related papers (2022-04-17T11:21:18Z) - CogNGen: Constructing the Kernel of a Hyperdimensional Predictive
Processing Cognitive Architecture [79.07468367923619]
We present a new cognitive architecture that combines two neurobiologically plausible, computational models.
We aim to develop a cognitive architecture that has the power of modern machine learning techniques.
arXiv Detail & Related papers (2022-03-31T04:44:28Z) - From internal models toward metacognitive AI [0.0]
In the prefrontal cortex, a distributed executive network called the "cognitive reality monitoring network" orchestrates conscious involvement of generative-inverse model pairs.
A high responsibility signal is given to the pairs that best capture the external world.
consciousness is determined by the entropy of responsibility signals across all pairs.
arXiv Detail & Related papers (2021-09-27T05:00:56Z) - Towards a Neural Model for Serial Order in Frontal Cortex: a Brain
Theory from Memory Development to Higher-Level Cognition [53.816853325427424]
We propose that the immature prefrontal cortex (PFC) use its primary functionality of detecting hierarchical patterns in temporal signals.
Our hypothesis is that the PFC detects the hierarchical structure in temporal sequences in the form of ordinal patterns and use them to index information hierarchically in different parts of the brain.
By doing so, it gives the tools to the language-ready brain for manipulating abstract knowledge and planning temporally ordered information.
arXiv Detail & Related papers (2020-05-22T14:29:51Z) - Brain-inspired self-organization with cellular neuromorphic computing
for multimodal unsupervised learning [0.0]
We propose a brain-inspired neural system based on the reentry theory using Self-Organizing Maps and Hebbian-like learning.
We show the gain of the so-called hardware plasticity induced by the ReSOM, where the system's topology is not fixed by the user but learned along the system's experience through self-organization.
arXiv Detail & Related papers (2020-04-11T21:02:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.