Finding Neurons in a Haystack: Case Studies with Sparse Probing
- URL: http://arxiv.org/abs/2305.01610v2
- Date: Fri, 2 Jun 2023 21:52:17 GMT
- Title: Finding Neurons in a Haystack: Case Studies with Sparse Probing
- Authors: Wes Gurnee, Neel Nanda, Matthew Pauly, Katherine Harvey, Dmitrii
Troitskii, Dimitris Bertsimas
- Abstract summary: Internal computations of large language models (LLMs) remain opaque and poorly understood.
We train $k$-sparse linear classifiers to predict the presence of features in the input.
By varying the value of $k$ we study the sparsity of learned representations and how this varies with model scale.
- Score: 2.278231643598956
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite rapid adoption and deployment of large language models (LLMs), the
internal computations of these models remain opaque and poorly understood. In
this work, we seek to understand how high-level human-interpretable features
are represented within the internal neuron activations of LLMs. We train
$k$-sparse linear classifiers (probes) on these internal activations to predict
the presence of features in the input; by varying the value of $k$ we study the
sparsity of learned representations and how this varies with model scale. With
$k=1$, we localize individual neurons which are highly relevant for a
particular feature, and perform a number of case studies to illustrate general
properties of LLMs. In particular, we show that early layers make use of sparse
combinations of neurons to represent many features in superposition, that
middle layers have seemingly dedicated neurons to represent higher-level
contextual features, and that increasing scale causes representational sparsity
to increase on average, but there are multiple types of scaling dynamics. In
all, we probe for over 100 unique features comprising 10 different categories
in 7 different models spanning 70 million to 6.9 billion parameters.
Related papers
- Neuron-based Personality Trait Induction in Large Language Models [115.08894603023712]
Large language models (LLMs) have become increasingly proficient at simulating various personality traits.
We present a neuron-based approach for personality trait induction in LLMs.
arXiv Detail & Related papers (2024-10-16T07:47:45Z) - Exploring Behavior-Relevant and Disentangled Neural Dynamics with Generative Diffusion Models [2.600709013150986]
Understanding the neural basis of behavior is a fundamental goal in neuroscience.
Our approach, named BeNeDiff'', first identifies a fine-grained and disentangled neural subspace.
It then employs state-of-the-art generative diffusion models to synthesize behavior videos that interpret the neural dynamics of each latent factor.
arXiv Detail & Related papers (2024-10-12T18:28:56Z) - Modularity in Transformers: Investigating Neuron Separability & Specialization [0.0]
Transformer models are increasingly prevalent in various applications, yet our understanding of their internal workings remains limited.
This paper investigates the modularity and task specialization of neurons within transformer architectures, focusing on both vision (ViT) and language (Mistral 7B) models.
Using a combination of selective pruning and MoEfication clustering techniques, we analyze the overlap and specialization of neurons across different tasks and data subsets.
arXiv Detail & Related papers (2024-08-30T14:35:01Z) - SPIN: Sparsifying and Integrating Internal Neurons in Large Language Models for Text Classification [6.227343685358882]
We present a model-agnostic framework that sparsifies and integrates internal neurons of intermediate layers of Large Language Models for text classification.
SPIN significantly improves text classification accuracy, efficiency, and interpretability.
arXiv Detail & Related papers (2023-11-27T16:28:20Z) - Multilayer Multiset Neuronal Networks -- MMNNs [55.2480439325792]
The present work describes multilayer multiset neuronal networks incorporating two or more layers of coincidence similarity neurons.
The work also explores the utilization of counter-prototype points, which are assigned to the image regions to be avoided.
arXiv Detail & Related papers (2023-08-28T12:55:13Z) - The Expressive Leaky Memory Neuron: an Efficient and Expressive Phenomenological Neuron Model Can Solve Long-Horizon Tasks [64.08042492426992]
We introduce the Expressive Memory (ELM) neuron model, a biologically inspired model of a cortical neuron.
Our ELM neuron can accurately match the aforementioned input-output relationship with under ten thousand trainable parameters.
We evaluate it on various tasks with demanding temporal structures, including the Long Range Arena (LRA) datasets.
arXiv Detail & Related papers (2023-06-14T13:34:13Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - Understanding Neural Coding on Latent Manifolds by Sharing Features and
Dividing Ensembles [3.625425081454343]
Systems neuroscience relies on two complementary views of neural data, characterized by single neuron tuning curves and analysis of population activity.
These two perspectives combine elegantly in neural latent variable models that constrain the relationship between latent variables and neural activity.
We propose feature sharing across neural tuning curves, which significantly improves performance and leads to better-behaved optimization.
arXiv Detail & Related papers (2022-10-06T18:37:49Z) - Simple and complex spiking neurons: perspectives and analysis in a
simple STDP scenario [0.7829352305480283]
Spiking neural networks (SNNs) are inspired by biology and neuroscience to create fast and efficient learning systems.
This work considers various neuron models in the literature and then selects computational neuron models that are single-variable, efficient, and display different types of complexities.
We make a comparative study of three simple I&F neuron models, namely the LIF, the Quadratic I&F (QIF) and the Exponential I&F (EIF), to understand whether the use of more complex models increases the performance of the system.
arXiv Detail & Related papers (2022-06-28T10:01:51Z) - The Neural Coding Framework for Learning Generative Models [91.0357317238509]
We propose a novel neural generative model inspired by the theory of predictive processing in the brain.
In a similar way, artificial neurons in our generative model predict what neighboring neurons will do, and adjust their parameters based on how well the predictions matched reality.
arXiv Detail & Related papers (2020-12-07T01:20:38Z) - Non-linear Neurons with Human-like Apical Dendrite Activations [81.18416067005538]
We show that a standard neuron followed by our novel apical dendrite activation (ADA) can learn the XOR logical function with 100% accuracy.
We conduct experiments on six benchmark data sets from computer vision, signal processing and natural language processing.
arXiv Detail & Related papers (2020-02-02T21:09:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.