Modularity in Transformers: Investigating Neuron Separability & Specialization
- URL: http://arxiv.org/abs/2408.17324v1
- Date: Fri, 30 Aug 2024 14:35:01 GMT
- Title: Modularity in Transformers: Investigating Neuron Separability & Specialization
- Authors: Nicholas Pochinkov, Thomas Jones, Mohammed Rashidur Rahman,
- Abstract summary: Transformer models are increasingly prevalent in various applications, yet our understanding of their internal workings remains limited.
This paper investigates the modularity and task specialization of neurons within transformer architectures, focusing on both vision (ViT) and language (Mistral 7B) models.
Using a combination of selective pruning and MoEfication clustering techniques, we analyze the overlap and specialization of neurons across different tasks and data subsets.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer models are increasingly prevalent in various applications, yet our understanding of their internal workings remains limited. This paper investigates the modularity and task specialization of neurons within transformer architectures, focusing on both vision (ViT) and language (Mistral 7B) models. Using a combination of selective pruning and MoEfication clustering techniques, we analyze the overlap and specialization of neurons across different tasks and data subsets. Our findings reveal evidence of task-specific neuron clusters, with varying degrees of overlap between related tasks. We observe that neuron importance patterns persist to some extent even in randomly initialized models, suggesting an inherent structure that training refines. Additionally, we find that neuron clusters identified through MoEfication correspond more strongly to task-specific neurons in earlier and later layers of the models. This work contributes to a more nuanced understanding of transformer internals and offers insights into potential avenues for improving model interpretability and efficiency.
Related papers
- Disentangling Task Interference within Neurons: Model Merging in Alignment with Neuronal Mechanisms [9.230323472193918]
We present the first study on the impact of neuronal alignment in model merging.
We introduce NeuroMerging, a novel merging framework developed to mitigate task interference within neuronal subspaces.
arXiv Detail & Related papers (2025-03-07T11:00:24Z) - Neural Models of Task Adaptation: A Tutorial on Spiking Networks for Executive Control [0.0]
This tutorial presents a step-by-step approach to constructing a spiking neural network (SNN) that simulates task-switching dynamics.
The model incorporates biologically realistic features, including lateral inhibition, adaptive synaptic weights, and precise parameterization within physiologically relevant ranges.
By following this tutorial, researchers can develop and extend biologically inspired SNN models for studying cognitive processes and neural adaptation.
arXiv Detail & Related papers (2025-03-05T00:44:34Z) - Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning [30.781578037476347]
We introduce a novel approach to modeling transformer architectures using highly flexible non-autonomous neural ordinary differential equations (ODEs)
Our proposed model parameterizes all weights of attention and feed-forward blocks through neural networks, expressing these weights as functions of a continuous layer index.
Our neural ODE transformer demonstrates performance comparable to or better than vanilla transformers across various configurations and datasets.
arXiv Detail & Related papers (2025-03-03T09:12:14Z) - Range, not Independence, Drives Modularity in Biologically Inspired Representations [52.48094670415497]
We develop a theory of when biologically inspired networks modularise their representation of source variables (sources)
We derive necessary and sufficient conditions on a sample of sources that determine whether the neurons in an optimal linear autoencoder modularise.
Our theory applies to any dataset, extending far beyond the case of statistical independence studied in previous work.
arXiv Detail & Related papers (2024-10-08T17:41:37Z) - NeuSemSlice: Towards Effective DNN Model Maintenance via Neuron-level Semantic Slicing [10.909463767558023]
NeuSemSlice is a novel framework that introduces the semantic slicing technique for semantic-aware model maintenance tasks.
NeuSemSlice identifies, categorizes and merges critical neurons across different categories and layers according to their semantic similarity.
A thorough evaluation has demonstrated that NeuSemSlice significantly outperforms baselines in all three tasks.
arXiv Detail & Related papers (2024-07-26T03:19:13Z) - MindBridge: A Cross-Subject Brain Decoding Framework [60.58552697067837]
Brain decoding aims to reconstruct stimuli from acquired brain signals.
Currently, brain decoding is confined to a per-subject-per-model paradigm.
We present MindBridge, that achieves cross-subject brain decoding by employing only one model.
arXiv Detail & Related papers (2024-04-11T15:46:42Z) - The Expressive Leaky Memory Neuron: an Efficient and Expressive Phenomenological Neuron Model Can Solve Long-Horizon Tasks [64.08042492426992]
We introduce the Expressive Memory (ELM) neuron model, a biologically inspired model of a cortical neuron.
Our ELM neuron can accurately match the aforementioned input-output relationship with under ten thousand trainable parameters.
We evaluate it on various tasks with demanding temporal structures, including the Long Range Arena (LRA) datasets.
arXiv Detail & Related papers (2023-06-14T13:34:13Z) - Permutation Equivariant Neural Functionals [92.0667671999604]
This work studies the design of neural networks that can process the weights or gradients of other neural networks.
We focus on the permutation symmetries that arise in the weights of deep feedforward networks because hidden layer neurons have no inherent order.
In our experiments, we find that permutation equivariant neural functionals are effective on a diverse set of tasks.
arXiv Detail & Related papers (2023-02-27T18:52:38Z) - Understanding Neural Coding on Latent Manifolds by Sharing Features and
Dividing Ensembles [3.625425081454343]
Systems neuroscience relies on two complementary views of neural data, characterized by single neuron tuning curves and analysis of population activity.
These two perspectives combine elegantly in neural latent variable models that constrain the relationship between latent variables and neural activity.
We propose feature sharing across neural tuning curves, which significantly improves performance and leads to better-behaved optimization.
arXiv Detail & Related papers (2022-10-06T18:37:49Z) - EINNs: Epidemiologically-Informed Neural Networks [75.34199997857341]
We introduce a new class of physics-informed neural networks-EINN-crafted for epidemic forecasting.
We investigate how to leverage both the theoretical flexibility provided by mechanistic models as well as the data-driven expressability afforded by AI models.
arXiv Detail & Related papers (2022-02-21T18:59:03Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Evolving spiking neuron cellular automata and networks to emulate in
vitro neuronal activity [0.0]
We produce spiking neural systems that emulate the patterns of behavior of biological neurons in vitro.
Our models were able to produce a level of network-wide synchrony.
The genomes of the top-performing models indicate the excitability and density of connections in the model play an important role in determining the complexity of the produced activity.
arXiv Detail & Related papers (2021-10-15T17:55:04Z) - Modelling Neuronal Behaviour with Time Series Regression: Recurrent
Neural Networks on C. Elegans Data [0.0]
We show how the nervous system of C. Elegans can be modelled and simulated with data-driven models using different neural network architectures.
We show that GRU models with a hidden layer size of 4 units are able to accurately reproduce with high accuracy the system's response to very different stimuli.
arXiv Detail & Related papers (2021-07-01T10:39:30Z) - The Neural Coding Framework for Learning Generative Models [91.0357317238509]
We propose a novel neural generative model inspired by the theory of predictive processing in the brain.
In a similar way, artificial neurons in our generative model predict what neighboring neurons will do, and adjust their parameters based on how well the predictions matched reality.
arXiv Detail & Related papers (2020-12-07T01:20:38Z) - Learning identifiable and interpretable latent models of
high-dimensional neural activity using pi-VAE [10.529943544385585]
We propose a method that integrates key ingredients from latent models and traditional neural encoding models.
Our method, pi-VAE, is inspired by recent progress on identifiable variational auto-encoder.
We validate pi-VAE using synthetic data, and apply it to analyze neurophysiological datasets from rat hippocampus and macaque motor cortex.
arXiv Detail & Related papers (2020-11-09T22:00:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.