What Causes Polysemanticity? An Alternative Origin Story of Mixed
Selectivity from Incidental Causes
- URL: http://arxiv.org/abs/2312.03096v3
- Date: Tue, 13 Feb 2024 06:26:22 GMT
- Title: What Causes Polysemanticity? An Alternative Origin Story of Mixed
Selectivity from Incidental Causes
- Authors: Victor Lecomte, Kushal Thaman, Rylan Schaeffer, Naomi Bashkansky,
Trevor Chow, Sanmi Koyejo
- Abstract summary: Polysemantic neurons -- neurons that activate for a set of unrelated features -- have been seen as a significant obstacle towards interpretability of task-optimized deep networks.
We show that polysemanticity can arise incidentally, even when there are ample neurons to represent all features in the data.
- Score: 14.623741848860037
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Polysemantic neurons -- neurons that activate for a set of unrelated features
-- have been seen as a significant obstacle towards interpretability of
task-optimized deep networks, with implications for AI safety. The classic
origin story of polysemanticity is that the data contains more ``features" than
neurons, such that learning to perform a task forces the network to co-allocate
multiple unrelated features to the same neuron, endangering our ability to
understand networks' internal processing. In this work, we present a second and
non-mutually exclusive origin story of polysemanticity. We show that
polysemanticity can arise incidentally, even when there are ample neurons to
represent all features in the data, a phenomenon we term \textit{incidental
polysemanticity}. Using a combination of theory and experiments, we show that
incidental polysemanticity can arise due to multiple reasons including
regularization and neural noise; this incidental polysemanticity occurs because
random initialization can, by chance alone, initially assign multiple features
to the same neuron, and the training dynamics then strengthen such overlap. Our
paper concludes by calling for further research quantifying the
performance-polysemanticity tradeoff in task-optimized deep neural networks to
better understand to what extent polysemanticity is avoidable.
Related papers
- PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits [12.17671779091913]
We present a method for disentangling polysemanticity of any Deep Neural Network by decomposing a polysemantic neuron into multiple monosemantic "virtual" neurons.
We demonstrate how our approach allows us to find and disentangle various polysemantic units of ResNet models trained on ImageNet.
arXiv Detail & Related papers (2024-04-09T16:54:19Z) - Understanding polysemanticity in neural networks through coding theory [0.8702432681310401]
We propose a novel practical approach to network interpretability and theoretical insights into polysemanticity and the density of codes.
We show how random projections can reveal whether a network exhibits a smooth or non-differentiable code and hence how interpretable the code is.
Our approach advances the pursuit of interpretability in neural networks, providing insights into their underlying structure and suggesting new avenues for circuit-level interpretability.
arXiv Detail & Related papers (2024-01-31T16:31:54Z) - Interpreting Neural Networks through the Polytope Lens [0.2359380460160535]
Mechanistic interpretability aims to explain what a neural network has learned at a nuts-and-bolts level.
We study the way that piecewise linear activation functions partition the activation space into numerous discrete polytopes.
The polytope lens makes concrete predictions about the behavior of neural networks.
arXiv Detail & Related papers (2022-11-22T15:03:48Z) - Synergistic information supports modality integration and flexible
learning in neural networks solving multiple tasks [107.8565143456161]
We investigate the information processing strategies adopted by simple artificial neural networks performing a variety of cognitive tasks.
Results show that synergy increases as neural networks learn multiple diverse tasks.
randomly turning off neurons during training through dropout increases network redundancy, corresponding to an increase in robustness.
arXiv Detail & Related papers (2022-10-06T15:36:27Z) - Spiking neural network for nonlinear regression [68.8204255655161]
Spiking neural networks carry the potential for a massive reduction in memory and energy consumption.
They introduce temporal and neuronal sparsity, which can be exploited by next-generation neuromorphic hardware.
A framework for regression using spiking neural networks is proposed.
arXiv Detail & Related papers (2022-10-06T13:04:45Z) - Polysemanticity and Capacity in Neural Networks [1.4174475093445233]
Individual neurons in neural networks often represent a mixture of unrelated features.
This phenomenon, called polysemanticity, can make interpreting neural networks more difficult.
arXiv Detail & Related papers (2022-10-04T20:28:43Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - The Causal Neural Connection: Expressiveness, Learnability, and
Inference [125.57815987218756]
An object called structural causal model (SCM) represents a collection of mechanisms and sources of random variation of the system under investigation.
In this paper, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020) still holds for neural models.
We introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences.
arXiv Detail & Related papers (2021-07-02T01:55:18Z) - And/or trade-off in artificial neurons: impact on adversarial robustness [91.3755431537592]
Presence of sufficient number of OR-like neurons in a network can lead to classification brittleness and increased vulnerability to adversarial attacks.
We define AND-like neurons and propose measures to increase their proportion in the network.
Experimental results on the MNIST dataset suggest that our approach holds promise as a direction for further exploration.
arXiv Detail & Related papers (2021-02-15T08:19:05Z) - Towards a mathematical framework to inform Neural Network modelling via
Polynomial Regression [0.0]
It is shown that almost identical predictions can be made when certain conditions are met locally.
When learning from generated data, the proposed method producess that approximate correctly the data locally.
arXiv Detail & Related papers (2021-02-07T17:56:16Z) - Artificial Neural Variability for Deep Learning: On Overfitting, Noise
Memorization, and Catastrophic Forgetting [135.0863818867184]
artificial neural variability (ANV) helps artificial neural networks learn some advantages from natural'' neural networks.
ANV plays as an implicit regularizer of the mutual information between the training data and the learned model.
It can effectively relieve overfitting, label noise memorization, and catastrophic forgetting at negligible costs.
arXiv Detail & Related papers (2020-11-12T06:06:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.