Universal Neurons in GPT2 Language Models
- URL: http://arxiv.org/abs/2401.12181v1
- Date: Mon, 22 Jan 2024 18:11:01 GMT
- Title: Universal Neurons in GPT2 Language Models
- Authors: Wes Gurnee, Theo Horsley, Zifan Carl Guo, Tara Rezaei Kheirkhah, Qinyi
Sun, Will Hathaway, Neel Nanda, Dimitris Bertsimas
- Abstract summary: We study the universality of individual neurons across GPT2 models trained from different initial random seeds.
We find that 1-5% of neurons are universal, that is, pairs of neurons which consistently activate on the same inputs.
- Score: 4.9892471449871305
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A basic question within the emerging field of mechanistic interpretability is
the degree to which neural networks learn the same underlying mechanisms. In
other words, are neural mechanisms universal across different models? In this
work, we study the universality of individual neurons across GPT2 models
trained from different initial random seeds, motivated by the hypothesis that
universal neurons are likely to be interpretable. In particular, we compute
pairwise correlations of neuron activations over 100 million tokens for every
neuron pair across five different seeds and find that 1-5\% of neurons are
universal, that is, pairs of neurons which consistently activate on the same
inputs. We then study these universal neurons in detail, finding that they
usually have clear interpretations and taxonomize them into a small number of
neuron families. We conclude by studying patterns in neuron weights to
establish several universal functional roles of neurons in simple circuits:
deactivating attention heads, changing the entropy of the next token
distribution, and predicting the next token to (not) be within a particular
set.
Related papers
- Confidence Regulation Neurons in Language Models [91.90337752432075]
This study investigates the mechanisms by which large language models represent and regulate uncertainty in next-token predictions.
Entropy neurons are characterized by an unusually high weight norm and influence the final layer normalization (LayerNorm) scale to effectively scale down the logits.
token frequency neurons, which we describe here for the first time, boost or suppress each token's logit proportionally to its log frequency, thereby shifting the output distribution towards or away from the unigram distribution.
arXiv Detail & Related papers (2024-06-24T01:31:03Z) - Interpreting the Second-Order Effects of Neurons in CLIP [73.54377859089801]
We interpret the function of individual neurons in CLIP by automatically describing them using text.
We present the "second-order lens", analyzing the effect flowing from a neuron through the later attention heads, directly to the output.
Our results indicate that a scalable understanding of neurons can be used for model deception and for introducing new model capabilities.
arXiv Detail & Related papers (2024-06-06T17:59:52Z) - Identifying Interpretable Visual Features in Artificial and Biological
Neural Systems [3.604033202771937]
Single neurons in neural networks are often interpretable in that they represent individual, intuitively meaningful features.
Many neurons exhibit $textitmixed selectivity$, i.e., they represent multiple unrelated features.
We propose an automated method for quantifying visual interpretability and an approach for finding meaningful directions in network activation space.
arXiv Detail & Related papers (2023-10-17T17:41:28Z) - Single Biological Neurons as Temporally Precise Spatio-Temporal Pattern
Recognizers [0.0]
thesis is focused on the central idea that single neurons in the brain should be regarded as temporally highly complex-temporal pattern recognizers.
In chapter 2 we demonstrate that single neurons can generate temporally precise output patterns in response to specific-temporal input patterns.
In chapter 3, we use the differentiable deep network of a realistic cortical neuron as a tool to approximate the implications of the output of the neuron.
arXiv Detail & Related papers (2023-09-26T17:32:08Z) - Learning to Act through Evolution of Neural Diversity in Random Neural
Networks [9.387749254963595]
In most artificial neural networks (ANNs), neural computation is abstracted to an activation function that is usually shared between all neurons.
We propose the optimization of neuro-centric parameters to attain a set of diverse neurons that can perform complex computations.
arXiv Detail & Related papers (2023-05-25T11:33:04Z) - Constraints on the design of neuromorphic circuits set by the properties
of neural population codes [61.15277741147157]
In the brain, information is encoded, transmitted and used to inform behaviour.
Neuromorphic circuits need to encode information in a way compatible to that used by populations of neuron in the brain.
arXiv Detail & Related papers (2022-12-08T15:16:04Z) - Deep Learning in Random Neural Fields: Numerical Experiments via Neural
Tangent Kernel [10.578941575914516]
A biological neural network in the cortex forms a neural field.
Neurons in the field have their own receptive fields, and connection weights between two neurons are random but highly correlated when they are in close proximity in receptive fields.
We show that such a multilayer neural field is more robust than conventional models when input patterns are deformed by noise disturbances.
arXiv Detail & Related papers (2022-02-10T18:57:10Z) - Continuous Learning and Adaptation with Membrane Potential and
Activation Threshold Homeostasis [91.3755431537592]
This paper presents the Membrane Potential and Activation Threshold Homeostasis (MPATH) neuron model.
The model allows neurons to maintain a form of dynamic equilibrium by automatically regulating their activity when presented with input.
Experiments demonstrate the model's ability to adapt to and continually learn from its input.
arXiv Detail & Related papers (2021-04-22T04:01:32Z) - The Neural Coding Framework for Learning Generative Models [91.0357317238509]
We propose a novel neural generative model inspired by the theory of predictive processing in the brain.
In a similar way, artificial neurons in our generative model predict what neighboring neurons will do, and adjust their parameters based on how well the predictions matched reality.
arXiv Detail & Related papers (2020-12-07T01:20:38Z) - Continual Learning with Deep Artificial Neurons [0.0]
We introduce Deep Artificial Neurons (DANs), which are themselves realized as deep neural networks.
We demonstrate that it is possible to meta-learn a single parameter vector, which we dub a neuronal phenotype, shared by all DANs in the network.
We show that a suitable neuronal phenotype can endow a single network with an innate ability to update its synapses with minimal forgetting.
arXiv Detail & Related papers (2020-11-13T17:50:10Z) - Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts.
We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.