Modularity in Transformers: Investigating Neuron Separability & Specialization
- URL: http://arxiv.org/abs/2408.17324v1
- Date: Fri, 30 Aug 2024 14:35:01 GMT
- Title: Modularity in Transformers: Investigating Neuron Separability & Specialization
- Authors: Nicholas Pochinkov, Thomas Jones, Mohammed Rashidur Rahman,
- Abstract summary: Transformer models are increasingly prevalent in various applications, yet our understanding of their internal workings remains limited.
This paper investigates the modularity and task specialization of neurons within transformer architectures, focusing on both vision (ViT) and language (Mistral 7B) models.
Using a combination of selective pruning and MoEfication clustering techniques, we analyze the overlap and specialization of neurons across different tasks and data subsets.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer models are increasingly prevalent in various applications, yet our understanding of their internal workings remains limited. This paper investigates the modularity and task specialization of neurons within transformer architectures, focusing on both vision (ViT) and language (Mistral 7B) models. Using a combination of selective pruning and MoEfication clustering techniques, we analyze the overlap and specialization of neurons across different tasks and data subsets. Our findings reveal evidence of task-specific neuron clusters, with varying degrees of overlap between related tasks. We observe that neuron importance patterns persist to some extent even in randomly initialized models, suggesting an inherent structure that training refines. Additionally, we find that neuron clusters identified through MoEfication correspond more strongly to task-specific neurons in earlier and later layers of the models. This work contributes to a more nuanced understanding of transformer internals and offers insights into potential avenues for improving model interpretability and efficiency.
Related papers
- NeuSemSlice: Towards Effective DNN Model Maintenance via Neuron-level Semantic Slicing [10.909463767558023]
NeuSemSlice is a novel framework that introduces the semantic slicing technique for semantic-aware model maintenance tasks.
NeuSemSlice identifies, categorizes and merges critical neurons across different categories and layers according to their semantic similarity.
A thorough evaluation has demonstrated that NeuSemSlice significantly outperforms baselines in all three tasks.
arXiv Detail & Related papers (2024-07-26T03:19:13Z) - MindBridge: A Cross-Subject Brain Decoding Framework [60.58552697067837]
Brain decoding aims to reconstruct stimuli from acquired brain signals.
Currently, brain decoding is confined to a per-subject-per-model paradigm.
We present MindBridge, that achieves cross-subject brain decoding by employing only one model.
arXiv Detail & Related papers (2024-04-11T15:46:42Z) - The Expressive Leaky Memory Neuron: an Efficient and Expressive Phenomenological Neuron Model Can Solve Long-Horizon Tasks [64.08042492426992]
We introduce the Expressive Memory (ELM) neuron model, a biologically inspired model of a cortical neuron.
Our ELM neuron can accurately match the aforementioned input-output relationship with under ten thousand trainable parameters.
We evaluate it on various tasks with demanding temporal structures, including the Long Range Arena (LRA) datasets.
arXiv Detail & Related papers (2023-06-14T13:34:13Z) - Permutation Equivariant Neural Functionals [92.0667671999604]
This work studies the design of neural networks that can process the weights or gradients of other neural networks.
We focus on the permutation symmetries that arise in the weights of deep feedforward networks because hidden layer neurons have no inherent order.
In our experiments, we find that permutation equivariant neural functionals are effective on a diverse set of tasks.
arXiv Detail & Related papers (2023-02-27T18:52:38Z) - Understanding Neural Coding on Latent Manifolds by Sharing Features and
Dividing Ensembles [3.625425081454343]
Systems neuroscience relies on two complementary views of neural data, characterized by single neuron tuning curves and analysis of population activity.
These two perspectives combine elegantly in neural latent variable models that constrain the relationship between latent variables and neural activity.
We propose feature sharing across neural tuning curves, which significantly improves performance and leads to better-behaved optimization.
arXiv Detail & Related papers (2022-10-06T18:37:49Z) - EINNs: Epidemiologically-Informed Neural Networks [75.34199997857341]
We introduce a new class of physics-informed neural networks-EINN-crafted for epidemic forecasting.
We investigate how to leverage both the theoretical flexibility provided by mechanistic models as well as the data-driven expressability afforded by AI models.
arXiv Detail & Related papers (2022-02-21T18:59:03Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Evolving spiking neuron cellular automata and networks to emulate in
vitro neuronal activity [0.0]
We produce spiking neural systems that emulate the patterns of behavior of biological neurons in vitro.
Our models were able to produce a level of network-wide synchrony.
The genomes of the top-performing models indicate the excitability and density of connections in the model play an important role in determining the complexity of the produced activity.
arXiv Detail & Related papers (2021-10-15T17:55:04Z) - Modelling Neuronal Behaviour with Time Series Regression: Recurrent
Neural Networks on C. Elegans Data [0.0]
We show how the nervous system of C. Elegans can be modelled and simulated with data-driven models using different neural network architectures.
We show that GRU models with a hidden layer size of 4 units are able to accurately reproduce with high accuracy the system's response to very different stimuli.
arXiv Detail & Related papers (2021-07-01T10:39:30Z) - The Neural Coding Framework for Learning Generative Models [91.0357317238509]
We propose a novel neural generative model inspired by the theory of predictive processing in the brain.
In a similar way, artificial neurons in our generative model predict what neighboring neurons will do, and adjust their parameters based on how well the predictions matched reality.
arXiv Detail & Related papers (2020-12-07T01:20:38Z) - Learning identifiable and interpretable latent models of
high-dimensional neural activity using pi-VAE [10.529943544385585]
We propose a method that integrates key ingredients from latent models and traditional neural encoding models.
Our method, pi-VAE, is inspired by recent progress on identifiable variational auto-encoder.
We validate pi-VAE using synthetic data, and apply it to analyze neurophysiological datasets from rat hippocampus and macaque motor cortex.
arXiv Detail & Related papers (2020-11-09T22:00:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.