Related papers: Transformer Mechanisms Mimic Frontostriatal Gating Operations When Trained on Human Working Memory Tasks

Transformer Mechanisms Mimic Frontostriatal Gating Operations When Trained on Human Working Memory Tasks

URL: http://arxiv.org/abs/2402.08211v1
Date: Tue, 13 Feb 2024 04:28:43 GMT
Title: Transformer Mechanisms Mimic Frontostriatal Gating Operations When Trained on Human Working Memory Tasks
Authors: Aaron Traylor, Jack Merullo, Michael J. Frank, Ellie Pavlick
Abstract summary: We analyze the mechanisms that emerge within a vanilla attention-only Transformer trained on a simple sequence modeling task. We find that, as a result of training, the self-attention mechanism within the Transformer specializes in a way that mirrors the input and output gating mechanisms.
Score: 19.574270595733502
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Models based on the Transformer neural network architecture have seen success on a wide variety of tasks that appear to require complex "cognitive branching" -- or the ability to maintain pursuit of one goal while accomplishing others. In cognitive neuroscience, success on such tasks is thought to rely on sophisticated frontostriatal mechanisms for selective \textit{gating}, which enable role-addressable updating -- and later readout -- of information to and from distinct "addresses" of memory, in the form of clusters of neurons. However, Transformer models have no such mechanisms intentionally built-in. It is thus an open question how Transformers solve such tasks, and whether the mechanisms that emerge to help them to do so bear any resemblance to the gating mechanisms in the human brain. In this work, we analyze the mechanisms that emerge within a vanilla attention-only Transformer trained on a simple sequence modeling task inspired by a task explicitly designed to study working memory gating in computational cognitive neuroscience. We find that, as a result of training, the self-attention mechanism within the Transformer specializes in a way that mirrors the input and output gating mechanisms which were explicitly incorporated into earlier, more biologically-inspired architectures. These results suggest opportunities for future research on computational similarities between modern AI architectures and models of the human brain.

Related papers

Bridging Brains and Machines: A Unified Frontier in Neuroscience, Artificial Intelligence, and Neuromorphic Systems [30.78088656917387]
This position and survey paper identifies the emerging convergence of neuroscience, artificial general intelligence, and neuromorphic computing.<n>We highlight how synaptic plasticity, sparse spike-based communication, and multimodal association provide design principles for next-generation AGI systems.<n>We discuss emerging physical substrates capable of breaking the von Neumann bottleneck to achieve brain-scale efficiency in silicon.
arXiv Detail & Related papers (2025-07-14T18:43:05Z)
Neural Brain: A Neuroscience-inspired Framework for Embodied Agents [58.58177409853298]
Current AI systems, such as large language models, remain disembodied, unable to physically engage with the world.<n>At the core of this challenge lies the concept of Neural Brain, a central intelligence system designed to drive embodied agents with human-like adaptability.<n>This paper introduces a unified framework for the Neural Brain of embodied agents, addressing two fundamental challenges.
arXiv Detail & Related papers (2025-05-12T15:05:34Z)
Modularity in Transformers: Investigating Neuron Separability & Specialization [0.0]
Transformer models are increasingly prevalent in various applications, yet our understanding of their internal workings remains limited. This paper investigates the modularity and task specialization of neurons within transformer architectures, focusing on both vision (ViT) and language (Mistral 7B) models. Using a combination of selective pruning and MoEfication clustering techniques, we analyze the overlap and specialization of neurons across different tasks and data subsets.
arXiv Detail & Related papers (2024-08-30T14:35:01Z)
Synergistic pathways of modulation enable robust task packing within neural dynamics [0.0]
We use recurrent network models to probe the distinctions between two forms of contextual modulation of neural dynamics. We demonstrate distinction between these mechanisms at the level of the neuronal dynamics they induce. These characterizations indicate complementarity and synergy in how these mechanisms act, potentially over multiple time-scales.
arXiv Detail & Related papers (2024-08-02T15:12:01Z)
Delving Deeper Into Astromorphic Transformers [1.7595244858303718]
This paper seeks to delve deeper into various key aspects of neuron-synapse-astrocyte interactions to mimic self-attention mechanisms in Transformers. Our analysis on sentiment and image classification tasks highlights the advantages of Astromorphic Transformers, offering improved accuracy and learning speed.
arXiv Detail & Related papers (2023-12-18T04:35:07Z)
Brain-Inspired Machine Intelligence: A Survey of Neurobiologically-Plausible Credit Assignment [65.268245109828]
We examine algorithms for conducting credit assignment in artificial neural networks that are inspired or motivated by neurobiology. We organize the ever-growing set of brain-inspired learning schemes into six general families and consider these in the context of backpropagation of errors. The results of this review are meant to encourage future developments in neuro-mimetic systems and their constituent learning processes.
arXiv Detail & Related papers (2023-12-01T05:20:57Z)
A Neuro-mimetic Realization of the Common Model of Cognition via Hebbian Learning and Free Energy Minimization [55.11642177631929]
Large neural generative models are capable of synthesizing semantically rich passages of text or producing complex images. We discuss the COGnitive Neural GENerative system, such an architecture that casts the Common Model of Cognition.
arXiv Detail & Related papers (2023-10-14T23:28:48Z)
Incremental procedural and sensorimotor learning in cognitive humanoid robots [52.77024349608834]
This work presents a cognitive agent that can learn procedures incrementally. We show the cognitive functions required in each substage and how adding new functions helps address tasks previously unsolved by the agent. Results show that this approach is capable of solving complex tasks incrementally.
arXiv Detail & Related papers (2023-04-30T22:51:31Z)
Contrastive-Signal-Dependent Plasticity: Self-Supervised Learning in Spiking Neural Circuits [61.94533459151743]
This work addresses the challenge of designing neurobiologically-motivated schemes for adjusting the synapses of spiking networks. Our experimental simulations demonstrate a consistent advantage over other biologically-plausible approaches when training recurrent spiking networks.
arXiv Detail & Related papers (2023-03-30T02:40:28Z)
From Biological Synapses to Intelligent Robots [0.0]
Hebbian synaptic learning is discussed as a functionally relevant model for machine learning and intelligence. The potential for adaptive learning and control without supervision is brought forward. The insights collected here point toward the Hebbian model as a choice solution for intelligent robotics and sensor systems.
arXiv Detail & Related papers (2022-02-25T12:39:22Z)
Cognitive architecture aided by working-memory for self-supervised multi-modal humans recognition [54.749127627191655]
The ability to recognize human partners is an important social skill to build personalized and long-term human-robot interactions. Deep learning networks have achieved state-of-the-art results and demonstrated to be suitable tools to address such a task. One solution is to make robots learn from their first-hand sensory data with self-supervision.
arXiv Detail & Related papers (2021-03-16T13:50:24Z)
Transformers with Competitive Ensembles of Independent Mechanisms [97.93090139318294]
We propose a new Transformer layer which divides the hidden representation and parameters into multiple mechanisms, which only exchange information through attention. We study TIM on a large-scale BERT model, on the Image Transformer, and on speech enhancement and find evidence for semantically meaningful specialization as well as improved performance.
arXiv Detail & Related papers (2021-02-27T21:48:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.