Related papers: Universal Neurons in GPT2 Language Models

Universal Neurons in GPT2 Language Models

URL: http://arxiv.org/abs/2401.12181v1
Date: Mon, 22 Jan 2024 18:11:01 GMT
Title: Universal Neurons in GPT2 Language Models
Authors: Wes Gurnee, Theo Horsley, Zifan Carl Guo, Tara Rezaei Kheirkhah, Qinyi Sun, Will Hathaway, Neel Nanda, Dimitris Bertsimas
Abstract summary: We study the universality of individual neurons across GPT2 models trained from different initial random seeds. We find that 1-5% of neurons are universal, that is, pairs of neurons which consistently activate on the same inputs.
Score: 4.9892471449871305
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A basic question within the emerging field of mechanistic interpretability is the degree to which neural networks learn the same underlying mechanisms. In other words, are neural mechanisms universal across different models? In this work, we study the universality of individual neurons across GPT2 models trained from different initial random seeds, motivated by the hypothesis that universal neurons are likely to be interpretable. In particular, we compute pairwise correlations of neuron activations over 100 million tokens for every neuron pair across five different seeds and find that 1-5\% of neurons are universal, that is, pairs of neurons which consistently activate on the same inputs. We then study these universal neurons in detail, finding that they usually have clear interpretations and taxonomize them into a small number of neuron families. We conclude by studying patterns in neuron weights to establish several universal functional roles of neurons in simple circuits: deactivating attention heads, changing the entropy of the next token distribution, and predicting the next token to (not) be within a particular set.

Related papers

Neuron Platonic Intrinsic Representation From Dynamics Using Contrastive Learning [4.538151592901714]
We propose the NeurPIR (Neuron Platonic Intrinsic Representation) framework. It uses contrastive learning, with segments from the same neuron as positive pairs and those from different neurons as negative pairs. We tested our method on Izhikevich model-simulated neuronal population dynamics data.
arXiv Detail & Related papers (2025-02-06T02:22:23Z)
Confidence Regulation Neurons in Language Models [91.90337752432075]
This study investigates the mechanisms by which large language models represent and regulate uncertainty in next-token predictions. Entropy neurons are characterized by an unusually high weight norm and influence the final layer normalization (LayerNorm) scale to effectively scale down the logits. token frequency neurons, which we describe here for the first time, boost or suppress each token's logit proportionally to its log frequency, thereby shifting the output distribution towards or away from the unigram distribution.
arXiv Detail & Related papers (2024-06-24T01:31:03Z)
Interpreting the Second-Order Effects of Neurons in CLIP [73.54377859089801]
We interpret the function of individual neurons in CLIP by automatically describing them using text. We present the "second-order lens", analyzing the effect flowing from a neuron through the later attention heads, directly to the output. Our results indicate that a scalable understanding of neurons can be used for model deception and for introducing new model capabilities.
arXiv Detail & Related papers (2024-06-06T17:59:52Z)
Identifying Interpretable Visual Features in Artificial and Biological Neural Systems [3.604033202771937]
Single neurons in neural networks are often interpretable in that they represent individual, intuitively meaningful features. Many neurons exhibit $textitmixed selectivity$, i.e., they represent multiple unrelated features. We propose an automated method for quantifying visual interpretability and an approach for finding meaningful directions in network activation space.
arXiv Detail & Related papers (2023-10-17T17:41:28Z)
Single Biological Neurons as Temporally Precise Spatio-Temporal Pattern Recognizers [0.0]
thesis is focused on the central idea that single neurons in the brain should be regarded as temporally highly complex-temporal pattern recognizers. In chapter 2 we demonstrate that single neurons can generate temporally precise output patterns in response to specific-temporal input patterns. In chapter 3, we use the differentiable deep network of a realistic cortical neuron as a tool to approximate the implications of the output of the neuron.
arXiv Detail & Related papers (2023-09-26T17:32:08Z)
Learning to Act through Evolution of Neural Diversity in Random Neural Networks [9.387749254963595]
In most artificial neural networks (ANNs), neural computation is abstracted to an activation function that is usually shared between all neurons. We propose the optimization of neuro-centric parameters to attain a set of diverse neurons that can perform complex computations.
arXiv Detail & Related papers (2023-05-25T11:33:04Z)
Constraints on the design of neuromorphic circuits set by the properties of neural population codes [61.15277741147157]
In the brain, information is encoded, transmitted and used to inform behaviour. Neuromorphic circuits need to encode information in a way compatible to that used by populations of neuron in the brain.
arXiv Detail & Related papers (2022-12-08T15:16:04Z)
Deep Learning in Random Neural Fields: Numerical Experiments via Neural Tangent Kernel [10.578941575914516]
A biological neural network in the cortex forms a neural field. Neurons in the field have their own receptive fields, and connection weights between two neurons are random but highly correlated when they are in close proximity in receptive fields. We show that such a multilayer neural field is more robust than conventional models when input patterns are deformed by noise disturbances.
arXiv Detail & Related papers (2022-02-10T18:57:10Z)
Continuous Learning and Adaptation with Membrane Potential and Activation Threshold Homeostasis [91.3755431537592]
This paper presents the Membrane Potential and Activation Threshold Homeostasis (MPATH) neuron model. The model allows neurons to maintain a form of dynamic equilibrium by automatically regulating their activity when presented with input. Experiments demonstrate the model's ability to adapt to and continually learn from its input.
arXiv Detail & Related papers (2021-04-22T04:01:32Z)
The Neural Coding Framework for Learning Generative Models [91.0357317238509]
We propose a novel neural generative model inspired by the theory of predictive processing in the brain. In a similar way, artificial neurons in our generative model predict what neighboring neurons will do, and adjust their parameters based on how well the predictions matched reality.
arXiv Detail & Related papers (2020-12-07T01:20:38Z)
Continual Learning with Deep Artificial Neurons [0.0]
We introduce Deep Artificial Neurons (DANs), which are themselves realized as deep neural networks. We demonstrate that it is possible to meta-learn a single parameter vector, which we dub a neuronal phenotype, shared by all DANs in the network. We show that a suitable neuronal phenotype can endow a single network with an innate ability to update its synapses with minimal forgetting.
arXiv Detail & Related papers (2020-11-13T17:50:10Z)
Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts. We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.