Transformer Mechanisms Mimic Frontostriatal Gating Operations When
Trained on Human Working Memory Tasks
- URL: http://arxiv.org/abs/2402.08211v1
- Date: Tue, 13 Feb 2024 04:28:43 GMT
- Title: Transformer Mechanisms Mimic Frontostriatal Gating Operations When
Trained on Human Working Memory Tasks
- Authors: Aaron Traylor, Jack Merullo, Michael J. Frank, Ellie Pavlick
- Abstract summary: We analyze the mechanisms that emerge within a vanilla attention-only Transformer trained on a simple sequence modeling task.
We find that, as a result of training, the self-attention mechanism within the Transformer specializes in a way that mirrors the input and output gating mechanisms.
- Score: 19.574270595733502
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Models based on the Transformer neural network architecture have seen success
on a wide variety of tasks that appear to require complex "cognitive branching"
-- or the ability to maintain pursuit of one goal while accomplishing others.
In cognitive neuroscience, success on such tasks is thought to rely on
sophisticated frontostriatal mechanisms for selective \textit{gating}, which
enable role-addressable updating -- and later readout -- of information to and
from distinct "addresses" of memory, in the form of clusters of neurons.
However, Transformer models have no such mechanisms intentionally built-in. It
is thus an open question how Transformers solve such tasks, and whether the
mechanisms that emerge to help them to do so bear any resemblance to the gating
mechanisms in the human brain. In this work, we analyze the mechanisms that
emerge within a vanilla attention-only Transformer trained on a simple sequence
modeling task inspired by a task explicitly designed to study working memory
gating in computational cognitive neuroscience. We find that, as a result of
training, the self-attention mechanism within the Transformer specializes in a
way that mirrors the input and output gating mechanisms which were explicitly
incorporated into earlier, more biologically-inspired architectures. These
results suggest opportunities for future research on computational similarities
between modern AI architectures and models of the human brain.
Related papers
- Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation [52.77133661679439]
Investigating internal reasoning mechanisms of large language models can help us design better model architectures and training strategies.
We investigate the matching mechanism employed by Transformer for multi-step reasoning on a constructed dataset.
We propose a conjecture on the upper bound of the model's reasoning ability based on this phenomenon.
arXiv Detail & Related papers (2024-05-24T07:41:26Z) - Brain-Inspired Machine Intelligence: A Survey of
Neurobiologically-Plausible Credit Assignment [65.268245109828]
We examine algorithms for conducting credit assignment in artificial neural networks that are inspired or motivated by neurobiology.
We organize the ever-growing set of brain-inspired learning schemes into six general families and consider these in the context of backpropagation of errors.
The results of this review are meant to encourage future developments in neuro-mimetic systems and their constituent learning processes.
arXiv Detail & Related papers (2023-12-01T05:20:57Z) - A Neuro-mimetic Realization of the Common Model of Cognition via Hebbian
Learning and Free Energy Minimization [55.11642177631929]
Large neural generative models are capable of synthesizing semantically rich passages of text or producing complex images.
We discuss the COGnitive Neural GENerative system, such an architecture that casts the Common Model of Cognition.
arXiv Detail & Related papers (2023-10-14T23:28:48Z) - Incremental procedural and sensorimotor learning in cognitive humanoid
robots [52.77024349608834]
This work presents a cognitive agent that can learn procedures incrementally.
We show the cognitive functions required in each substage and how adding new functions helps address tasks previously unsolved by the agent.
Results show that this approach is capable of solving complex tasks incrementally.
arXiv Detail & Related papers (2023-04-30T22:51:31Z) - From Biological Synapses to Intelligent Robots [0.0]
Hebbian synaptic learning is discussed as a functionally relevant model for machine learning and intelligence.
The potential for adaptive learning and control without supervision is brought forward.
The insights collected here point toward the Hebbian model as a choice solution for intelligent robotics and sensor systems.
arXiv Detail & Related papers (2022-02-25T12:39:22Z) - Bottom-up and top-down approaches for the design of neuromorphic
processing systems: Tradeoffs and synergies between natural and artificial
intelligence [3.874729481138221]
Moore's law has driven exponential computing power expectations, its nearing end calls for new avenues for improving the overall system performance.
One of these avenues is the exploration of alternative brain-inspired computing architectures that aim at achieving the flexibility and computational efficiency of biological neural processing systems.
We provide a comprehensive overview of the field, highlighting the different levels of granularity at which this paradigm shift is realized.
arXiv Detail & Related papers (2021-06-02T16:51:45Z) - Cognitive architecture aided by working-memory for self-supervised
multi-modal humans recognition [54.749127627191655]
The ability to recognize human partners is an important social skill to build personalized and long-term human-robot interactions.
Deep learning networks have achieved state-of-the-art results and demonstrated to be suitable tools to address such a task.
One solution is to make robots learn from their first-hand sensory data with self-supervision.
arXiv Detail & Related papers (2021-03-16T13:50:24Z) - Transformers with Competitive Ensembles of Independent Mechanisms [97.93090139318294]
We propose a new Transformer layer which divides the hidden representation and parameters into multiple mechanisms, which only exchange information through attention.
We study TIM on a large-scale BERT model, on the Image Transformer, and on speech enhancement and find evidence for semantically meaningful specialization as well as improved performance.
arXiv Detail & Related papers (2021-02-27T21:48:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.