Linear Explanations for Individual Neurons
- URL: http://arxiv.org/abs/2405.06855v1
- Date: Fri, 10 May 2024 23:48:37 GMT
- Title: Linear Explanations for Individual Neurons
- Authors: Tuomas Oikarinen, Tsui-Wei Weng,
- Abstract summary: We show that the highest activation range is only responsible for a very small percentage of the neuron's causal effect.
In addition, inputs causing lower activations are often very different and can't be reliably predicted by only looking at high activations.
- Score: 12.231741536057378
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years many methods have been developed to understand the internal workings of neural networks, often by describing the function of individual neurons in the model. However, these methods typically only focus on explaining the very highest activations of a neuron. In this paper we show this is not sufficient, and that the highest activation range is only responsible for a very small percentage of the neuron's causal effect. In addition, inputs causing lower activations are often very different and can't be reliably predicted by only looking at high activations. We propose that neurons should instead be understood as a linear combination of concepts, and develop an efficient method for producing these linear explanations. In addition, we show how to automatically evaluate description quality using simulation, i.e. predicting neuron activations on unseen inputs in vision setting.
Related papers
- A More Accurate Approximation of Activation Function with Few Spikes Neurons [6.306126887439676]
spiking neural networks (SNNs) have attracted lots of attention as energy-efficient neural networks.
conventional spiking neurons, such as leaky integrate-and-fire neurons, cannot accurately represent complex non-linear activation functions.
arXiv Detail & Related papers (2024-08-19T02:08:56Z) - Confidence Regulation Neurons in Language Models [91.90337752432075]
This study investigates the mechanisms by which large language models represent and regulate uncertainty in next-token predictions.
Entropy neurons are characterized by an unusually high weight norm and influence the final layer normalization (LayerNorm) scale to effectively scale down the logits.
token frequency neurons, which we describe here for the first time, boost or suppress each token's logit proportionally to its log frequency, thereby shifting the output distribution towards or away from the unigram distribution.
arXiv Detail & Related papers (2024-06-24T01:31:03Z) - Interpreting the Second-Order Effects of Neurons in CLIP [73.54377859089801]
We interpret the function of individual neurons in CLIP by automatically describing them using text.
We present the "second-order lens", analyzing the effect flowing from a neuron through the later attention heads, directly to the output.
Our results indicate that an automated interpretation of neurons can be used for model deception and for introducing new model capabilities.
arXiv Detail & Related papers (2024-06-06T17:59:52Z) - Fast gradient-free activation maximization for neurons in spiking neural networks [5.805438104063613]
We present a framework with an efficient design for such a loop.
We track changes in the optimal stimuli for artificial neurons during training.
This formation of refined optimal stimuli is associated with an increase in classification accuracy.
arXiv Detail & Related papers (2023-12-28T18:30:13Z) - Neuron to Graph: Interpreting Language Model Neurons at Scale [8.32093320910416]
This paper introduces a novel automated approach designed to scale interpretability techniques across a vast array of neurons within Large Language Models.
We propose Neuron to Graph (N2G), an innovative tool that automatically extracts a neuron's behaviour from the dataset it was trained on and translates it into an interpretable graph.
arXiv Detail & Related papers (2023-05-31T14:44:33Z) - Neural network with optimal neuron activation functions based on
additive Gaussian process regression [0.0]
More flexible neuron activation functions would allow using fewer neurons and layers and improve expressive power.
We show that additive Gaussian process regression (GPR) can be used to construct optimal neuron activation functions that are individual to each neuron.
An approach is also introduced that avoids non-linear fitting of neural network parameters.
arXiv Detail & Related papers (2023-01-13T14:19:17Z) - Overcoming the Domain Gap in Contrastive Learning of Neural Action
Representations [60.47807856873544]
A fundamental goal in neuroscience is to understand the relationship between neural activity and behavior.
We generated a new multimodal dataset consisting of the spontaneous behaviors generated by fruit flies.
This dataset and our new set of augmentations promise to accelerate the application of self-supervised learning methods in neuroscience.
arXiv Detail & Related papers (2021-11-29T15:27:51Z) - And/or trade-off in artificial neurons: impact on adversarial robustness [91.3755431537592]
Presence of sufficient number of OR-like neurons in a network can lead to classification brittleness and increased vulnerability to adversarial attacks.
We define AND-like neurons and propose measures to increase their proportion in the network.
Experimental results on the MNIST dataset suggest that our approach holds promise as a direction for further exploration.
arXiv Detail & Related papers (2021-02-15T08:19:05Z) - Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts.
We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z) - Non-linear Neurons with Human-like Apical Dendrite Activations [81.18416067005538]
We show that a standard neuron followed by our novel apical dendrite activation (ADA) can learn the XOR logical function with 100% accuracy.
We conduct experiments on six benchmark data sets from computer vision, signal processing and natural language processing.
arXiv Detail & Related papers (2020-02-02T21:09:39Z) - Learning Neural Activations [2.842794675894731]
We explore what happens when the activation function of each neuron in an artificial neural network is learned from data alone.
This is achieved by modelling the activation function of each neuron as a small neural network whose weights are shared by all neurons in the original network.
arXiv Detail & Related papers (2019-12-27T15:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.