Transformer Mechanisms Mimic Frontostriatal Gating Operations When
Trained on Human Working Memory Tasks
- URL: http://arxiv.org/abs/2402.08211v1
- Date: Tue, 13 Feb 2024 04:28:43 GMT
- Title: Transformer Mechanisms Mimic Frontostriatal Gating Operations When
Trained on Human Working Memory Tasks
- Authors: Aaron Traylor, Jack Merullo, Michael J. Frank, Ellie Pavlick
- Abstract summary: We analyze the mechanisms that emerge within a vanilla attention-only Transformer trained on a simple sequence modeling task.
We find that, as a result of training, the self-attention mechanism within the Transformer specializes in a way that mirrors the input and output gating mechanisms.
- Score: 19.574270595733502
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Models based on the Transformer neural network architecture have seen success
on a wide variety of tasks that appear to require complex "cognitive branching"
-- or the ability to maintain pursuit of one goal while accomplishing others.
In cognitive neuroscience, success on such tasks is thought to rely on
sophisticated frontostriatal mechanisms for selective \textit{gating}, which
enable role-addressable updating -- and later readout -- of information to and
from distinct "addresses" of memory, in the form of clusters of neurons.
However, Transformer models have no such mechanisms intentionally built-in. It
is thus an open question how Transformers solve such tasks, and whether the
mechanisms that emerge to help them to do so bear any resemblance to the gating
mechanisms in the human brain. In this work, we analyze the mechanisms that
emerge within a vanilla attention-only Transformer trained on a simple sequence
modeling task inspired by a task explicitly designed to study working memory
gating in computational cognitive neuroscience. We find that, as a result of
training, the self-attention mechanism within the Transformer specializes in a
way that mirrors the input and output gating mechanisms which were explicitly
incorporated into earlier, more biologically-inspired architectures. These
results suggest opportunities for future research on computational similarities
between modern AI architectures and models of the human brain.
Related papers
- Modularity in Transformers: Investigating Neuron Separability & Specialization [0.0]
Transformer models are increasingly prevalent in various applications, yet our understanding of their internal workings remains limited.
This paper investigates the modularity and task specialization of neurons within transformer architectures, focusing on both vision (ViT) and language (Mistral 7B) models.
Using a combination of selective pruning and MoEfication clustering techniques, we analyze the overlap and specialization of neurons across different tasks and data subsets.
arXiv Detail & Related papers (2024-08-30T14:35:01Z) - Synergistic pathways of modulation enable robust task packing within neural dynamics [0.0]
We use recurrent network models to probe the distinctions between two forms of contextual modulation of neural dynamics.
We demonstrate distinction between these mechanisms at the level of the neuronal dynamics they induce.
These characterizations indicate complementarity and synergy in how these mechanisms act, potentially over multiple time-scales.
arXiv Detail & Related papers (2024-08-02T15:12:01Z) - Brain-Inspired Machine Intelligence: A Survey of
Neurobiologically-Plausible Credit Assignment [65.268245109828]
We examine algorithms for conducting credit assignment in artificial neural networks that are inspired or motivated by neurobiology.
We organize the ever-growing set of brain-inspired learning schemes into six general families and consider these in the context of backpropagation of errors.
The results of this review are meant to encourage future developments in neuro-mimetic systems and their constituent learning processes.
arXiv Detail & Related papers (2023-12-01T05:20:57Z) - A Neuro-mimetic Realization of the Common Model of Cognition via Hebbian
Learning and Free Energy Minimization [55.11642177631929]
Large neural generative models are capable of synthesizing semantically rich passages of text or producing complex images.
We discuss the COGnitive Neural GENerative system, such an architecture that casts the Common Model of Cognition.
arXiv Detail & Related papers (2023-10-14T23:28:48Z) - Incremental procedural and sensorimotor learning in cognitive humanoid
robots [52.77024349608834]
This work presents a cognitive agent that can learn procedures incrementally.
We show the cognitive functions required in each substage and how adding new functions helps address tasks previously unsolved by the agent.
Results show that this approach is capable of solving complex tasks incrementally.
arXiv Detail & Related papers (2023-04-30T22:51:31Z) - Contrastive-Signal-Dependent Plasticity: Self-Supervised Learning in Spiking Neural Circuits [61.94533459151743]
This work addresses the challenge of designing neurobiologically-motivated schemes for adjusting the synapses of spiking networks.
Our experimental simulations demonstrate a consistent advantage over other biologically-plausible approaches when training recurrent spiking networks.
arXiv Detail & Related papers (2023-03-30T02:40:28Z) - From Biological Synapses to Intelligent Robots [0.0]
Hebbian synaptic learning is discussed as a functionally relevant model for machine learning and intelligence.
The potential for adaptive learning and control without supervision is brought forward.
The insights collected here point toward the Hebbian model as a choice solution for intelligent robotics and sensor systems.
arXiv Detail & Related papers (2022-02-25T12:39:22Z) - Cognitive architecture aided by working-memory for self-supervised
multi-modal humans recognition [54.749127627191655]
The ability to recognize human partners is an important social skill to build personalized and long-term human-robot interactions.
Deep learning networks have achieved state-of-the-art results and demonstrated to be suitable tools to address such a task.
One solution is to make robots learn from their first-hand sensory data with self-supervision.
arXiv Detail & Related papers (2021-03-16T13:50:24Z) - Transformers with Competitive Ensembles of Independent Mechanisms [97.93090139318294]
We propose a new Transformer layer which divides the hidden representation and parameters into multiple mechanisms, which only exchange information through attention.
We study TIM on a large-scale BERT model, on the Image Transformer, and on speech enhancement and find evidence for semantically meaningful specialization as well as improved performance.
arXiv Detail & Related papers (2021-02-27T21:48:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.