Related papers: Linking In-context Learning in Transformers to Human Episodic Memory

Linking In-context Learning in Transformers to Human Episodic Memory

URL: http://arxiv.org/abs/2405.14992v1
Date: Thu, 23 May 2024 18:51:47 GMT
Title: Linking In-context Learning in Transformers to Human Episodic Memory
Authors: Li Ji-An, Corey Y. Zhou, Marcus K. Benna, Marcelo G. Mattar,
Abstract summary: We focus on the induction heads, which contribute to the in-context learning capabilities of Transformer-based large language models. We demonstrate that induction heads are behaviorally, functionally, and mechanistically similar to the contextual maintenance and retrieval model of human episodic memory.
Score: 1.124958340749622
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding the connections between artificial and biological intelligent systems can reveal fundamental principles underlying general intelligence. While many artificial intelligence (AI) models have a neuroscience counterpart, such connections are largely missing in Transformer models and the self-attention mechanism. Here, we examine the relationship between attention heads and human episodic memory. We focus on the induction heads, which contribute to the in-context learning capabilities of Transformer-based large language models (LLMs). We demonstrate that induction heads are behaviorally, functionally, and mechanistically similar to the contextual maintenance and retrieval (CMR) model of human episodic memory. Our analyses of LLMs pre-trained on extensive text data show that CMR-like heads often emerge in the intermediate model layers and that their behavior qualitatively mirrors the memory biases seen in humans. Our findings uncover a parallel between the computational mechanisms of LLMs and human memory, offering valuable insights into both research fields.

Related papers

Lilith: Developmental Modular LLMs with Chemical Signaling [49.1574468325115]
Current paradigms in Artificial Intelligence rely on layers of feedforward networks which model brain activity at the neuronal level.<n>We propose LILITH, a novel architecture that combines developmental training of modular language models with brain-inspired token-based communication protocols.
arXiv Detail & Related papers (2025-07-06T23:18:51Z)
Sequence-to-Sequence Models with Attention Mechanistically Map to the Architecture of Human Memory Search [13.961239165301315]
We show that foundational architectures in neural machine translation exhibit mechanisms that directly correspond to those specified in the Context Maintenance and Retrieval model of human memory.<n>We implement a neural machine translation model as a cognitive model of human memory search that is both interpretable and capable of capturing complex dynamics of learning.
arXiv Detail & Related papers (2025-06-20T18:43:15Z)
BrainMAP: Learning Multiple Activation Pathways in Brain Networks [77.15180533984947]
We introduce a novel framework BrainMAP to learn Multiple Activation Pathways in Brain networks. Our framework enables explanatory analyses of crucial brain regions involved in tasks.
arXiv Detail & Related papers (2024-12-23T09:13:35Z)
Deep reinforcement learning with time-scale invariant memory [1.338174941551702]
We integrate a computational neuroscience model of scale invariant memory into deep reinforcement learning (RL) agents. We show that such agents can learn robustly across a wide range of temporal scales. This result illustrates that incorporating computational principles from neuroscience and cognitive science into deep neural networks can enhance adaptability to complex temporal dynamics.
arXiv Detail & Related papers (2024-12-19T07:20:03Z)
Brain-like Functional Organization within Large Language Models [58.93629121400745]
The human brain has long inspired the pursuit of artificial intelligence (AI) Recent neuroimaging studies provide compelling evidence of alignment between the computational representation of artificial neural networks (ANNs) and the neural responses of the human brain to stimuli. In this study, we bridge this gap by directly coupling sub-groups of artificial neurons with functional brain networks (FBNs) This framework links the AN sub-groups to FBNs, enabling the delineation of brain-like functional organization within large language models (LLMs)
arXiv Detail & Related papers (2024-10-25T13:15:17Z)
Contextual Feature Extraction Hierarchies Converge in Large Language Models and the Brain [12.92793034617015]
We show that as large language models (LLMs) achieve higher performance on benchmark tasks, they become more brain-like. We also show the importance of contextual information in improving model performance and brain similarity.
arXiv Detail & Related papers (2024-01-31T08:48:35Z)
A Language Model with Limited Memory Capacity Captures Interference in Human Sentence Processing [25.916625483405802]
We develop a recurrent neural language model with a single self-attention head. We show that our model's single attention head captures semantic and syntactic interference effects observed in human experiments.
arXiv Detail & Related papers (2023-10-24T19:33:27Z)
Unveiling Theory of Mind in Large Language Models: A Parallel to Single Neurons in the Human Brain [2.5350521110810056]
Large language models (LLMs) have been found to exhibit a certain level of Theory of Mind (ToM) The precise processes underlying LLM's capacity for ToM or their similarities with that of humans remains largely unknown.
arXiv Detail & Related papers (2023-09-04T15:26:15Z)
Language Knowledge-Assisted Representation Learning for Skeleton-Based Action Recognition [71.35205097460124]
How humans understand and recognize the actions of others is a complex neuroscientific problem. LA-GCN proposes a graph convolution network using large-scale language models (LLM) knowledge assistance.
arXiv Detail & Related papers (2023-05-21T08:29:16Z)
Machine Psychology [54.287802134327485]
We argue that a fruitful direction for research is engaging large language models in behavioral experiments inspired by psychology. We highlight theoretical perspectives, experimental paradigms, and computational analysis techniques that this approach brings to the table. It paves the way for a "machine psychology" for generative artificial intelligence (AI) that goes beyond performance benchmarks.
arXiv Detail & Related papers (2023-03-24T13:24:41Z)
Learning Theory of Mind via Dynamic Traits Attribution [59.9781556714202]
We propose a new neural ToM architecture that learns to generate a latent trait vector of an actor from the past trajectories. This trait vector then multiplicatively modulates the prediction mechanism via a fast weights' scheme in the prediction neural network. We empirically show that the fast weights provide a good inductive bias to model the character traits of agents and hence improves mindreading ability.
arXiv Detail & Related papers (2022-04-17T11:21:18Z)
CogNGen: Constructing the Kernel of a Hyperdimensional Predictive Processing Cognitive Architecture [79.07468367923619]
We present a new cognitive architecture that combines two neurobiologically plausible, computational models. We aim to develop a cognitive architecture that has the power of modern machine learning techniques.
arXiv Detail & Related papers (2022-03-31T04:44:28Z)
From internal models toward metacognitive AI [0.0]
In the prefrontal cortex, a distributed executive network called the "cognitive reality monitoring network" orchestrates conscious involvement of generative-inverse model pairs. A high responsibility signal is given to the pairs that best capture the external world. consciousness is determined by the entropy of responsibility signals across all pairs.
arXiv Detail & Related papers (2021-09-27T05:00:56Z)
Towards a Neural Model for Serial Order in Frontal Cortex: a Brain Theory from Memory Development to Higher-Level Cognition [53.816853325427424]
We propose that the immature prefrontal cortex (PFC) use its primary functionality of detecting hierarchical patterns in temporal signals. Our hypothesis is that the PFC detects the hierarchical structure in temporal sequences in the form of ordinal patterns and use them to index information hierarchically in different parts of the brain. By doing so, it gives the tools to the language-ready brain for manipulating abstract knowledge and planning temporally ordered information.
arXiv Detail & Related papers (2020-05-22T14:29:51Z)
Brain-inspired self-organization with cellular neuromorphic computing for multimodal unsupervised learning [0.0]
We propose a brain-inspired neural system based on the reentry theory using Self-Organizing Maps and Hebbian-like learning. We show the gain of the so-called hardware plasticity induced by the ReSOM, where the system's topology is not fixed by the user but learned along the system's experience through self-organization.
arXiv Detail & Related papers (2020-04-11T21:02:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.