Linking In-context Learning in Transformers to Human Episodic Memory
- URL: http://arxiv.org/abs/2405.14992v1
- Date: Thu, 23 May 2024 18:51:47 GMT
- Title: Linking In-context Learning in Transformers to Human Episodic Memory
- Authors: Li Ji-An, Corey Y. Zhou, Marcus K. Benna, Marcelo G. Mattar,
- Abstract summary: We focus on the induction heads, which contribute to the in-context learning capabilities of Transformer-based large language models.
We demonstrate that induction heads are behaviorally, functionally, and mechanistically similar to the contextual maintenance and retrieval model of human episodic memory.
- Score: 1.124958340749622
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding the connections between artificial and biological intelligent systems can reveal fundamental principles underlying general intelligence. While many artificial intelligence (AI) models have a neuroscience counterpart, such connections are largely missing in Transformer models and the self-attention mechanism. Here, we examine the relationship between attention heads and human episodic memory. We focus on the induction heads, which contribute to the in-context learning capabilities of Transformer-based large language models (LLMs). We demonstrate that induction heads are behaviorally, functionally, and mechanistically similar to the contextual maintenance and retrieval (CMR) model of human episodic memory. Our analyses of LLMs pre-trained on extensive text data show that CMR-like heads often emerge in the intermediate model layers and that their behavior qualitatively mirrors the memory biases seen in humans. Our findings uncover a parallel between the computational mechanisms of LLMs and human memory, offering valuable insights into both research fields.
Related papers
- DSAM: A Deep Learning Framework for Analyzing Temporal and Spatial Dynamics in Brain Networks [4.041732967881764]
Most rs-fMRI studies compute a single static functional connectivity matrix across brain regions of interest.
These approaches are at risk of oversimplifying brain dynamics and lack proper consideration of the goal at hand.
We propose a novel interpretable deep learning framework that learns goal-specific functional connectivity matrix directly from time series.
arXiv Detail & Related papers (2024-05-19T23:35:06Z) - Contextual Feature Extraction Hierarchies Converge in Large Language
Models and the Brain [12.92793034617015]
We show that as large language models (LLMs) achieve higher performance on benchmark tasks, they become more brain-like.
We also show the importance of contextual information in improving model performance and brain similarity.
arXiv Detail & Related papers (2024-01-31T08:48:35Z) - A Language Model with Limited Memory Capacity Captures Interference in
Human Sentence Processing [25.916625483405802]
We develop a recurrent neural language model with a single self-attention head.
We show that our model's single attention head captures semantic and syntactic interference effects observed in human experiments.
arXiv Detail & Related papers (2023-10-24T19:33:27Z) - Language Knowledge-Assisted Representation Learning for Skeleton-Based
Action Recognition [71.35205097460124]
How humans understand and recognize the actions of others is a complex neuroscientific problem.
LA-GCN proposes a graph convolution network using large-scale language models (LLM) knowledge assistance.
arXiv Detail & Related papers (2023-05-21T08:29:16Z) - Multimodal foundation models are better simulators of the human brain [65.10501322822881]
We present a newly-designed multimodal foundation model pre-trained on 15 million image-text pairs.
We find that both visual and lingual encoders trained multimodally are more brain-like compared with unimodal ones.
arXiv Detail & Related papers (2022-08-17T12:36:26Z) - Learning Theory of Mind via Dynamic Traits Attribution [59.9781556714202]
We propose a new neural ToM architecture that learns to generate a latent trait vector of an actor from the past trajectories.
This trait vector then multiplicatively modulates the prediction mechanism via a fast weights' scheme in the prediction neural network.
We empirically show that the fast weights provide a good inductive bias to model the character traits of agents and hence improves mindreading ability.
arXiv Detail & Related papers (2022-04-17T11:21:18Z) - CogNGen: Constructing the Kernel of a Hyperdimensional Predictive
Processing Cognitive Architecture [79.07468367923619]
We present a new cognitive architecture that combines two neurobiologically plausible, computational models.
We aim to develop a cognitive architecture that has the power of modern machine learning techniques.
arXiv Detail & Related papers (2022-03-31T04:44:28Z) - From internal models toward metacognitive AI [0.0]
In the prefrontal cortex, a distributed executive network called the "cognitive reality monitoring network" orchestrates conscious involvement of generative-inverse model pairs.
A high responsibility signal is given to the pairs that best capture the external world.
consciousness is determined by the entropy of responsibility signals across all pairs.
arXiv Detail & Related papers (2021-09-27T05:00:56Z) - DeepRetinotopy: Predicting the Functional Organization of Human Visual
Cortex from Structural MRI Data using Geometric Deep Learning [125.99533416395765]
We developed a deep learning model capable of exploiting the structure of the cortex to learn the complex relationship between brain function and anatomy from structural and functional MRI data.
Our model was able to predict the functional organization of human visual cortex from anatomical properties alone, and it was also able to predict nuanced variations across individuals.
arXiv Detail & Related papers (2020-05-26T04:54:31Z) - Towards a Neural Model for Serial Order in Frontal Cortex: a Brain
Theory from Memory Development to Higher-Level Cognition [53.816853325427424]
We propose that the immature prefrontal cortex (PFC) use its primary functionality of detecting hierarchical patterns in temporal signals.
Our hypothesis is that the PFC detects the hierarchical structure in temporal sequences in the form of ordinal patterns and use them to index information hierarchically in different parts of the brain.
By doing so, it gives the tools to the language-ready brain for manipulating abstract knowledge and planning temporally ordered information.
arXiv Detail & Related papers (2020-05-22T14:29:51Z) - Brain-inspired self-organization with cellular neuromorphic computing
for multimodal unsupervised learning [0.0]
We propose a brain-inspired neural system based on the reentry theory using Self-Organizing Maps and Hebbian-like learning.
We show the gain of the so-called hardware plasticity induced by the ReSOM, where the system's topology is not fixed by the user but learned along the system's experience through self-organization.
arXiv Detail & Related papers (2020-04-11T21:02:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.