Learning from Emergence: A Study on Proactively Inhibiting the Monosemantic Neurons of Artificial Neural Networks
- URL: http://arxiv.org/abs/2312.11560v3
- Date: Wed, 19 Jun 2024 11:18:00 GMT
- Title: Learning from Emergence: A Study on Proactively Inhibiting the Monosemantic Neurons of Artificial Neural Networks
- Authors: Jiachuan Wang, Shimin Di, Lei Chen, Charles Wang Wai Ng,
- Abstract summary: We propose a new metric to measure the monosemanticity of neurons with the guarantee of efficiency for online computation.
We validate our conjecture that monosemanticity brings about performance change at different model scales.
- Score: 10.390475063385756
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, emergence has received widespread attention from the research community along with the success of large-scale models. Different from the literature, we hypothesize a key factor that promotes the performance during the increase of scale: the reduction of monosemantic neurons that can only form one-to-one correlations with specific features. Monosemantic neurons tend to be sparser and have negative impacts on the performance in large models. Inspired by this insight, we propose an intuitive idea to identify monosemantic neurons and inhibit them. However, achieving this goal is a non-trivial task as there is no unified quantitative evaluation metric and simply banning monosemantic neurons does not promote polysemanticity in neural networks. Therefore, we first propose a new metric to measure the monosemanticity of neurons with the guarantee of efficiency for online computation, then introduce a theoretically supported method to suppress monosemantic neurons and proactively promote the ratios of polysemantic neurons in training neural networks. We validate our conjecture that monosemanticity brings about performance change at different model scales on a variety of neural networks and benchmark datasets in different areas, including language, image, and physics simulation tasks. Further experiments validate our analysis and theory regarding the inhibition of monosemanticity.
Related papers
- Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness [68.69369585600698]
Deep learning models often suffer from a lack of interpretability due to polysemanticity.
Recent advances in monosemanticity, where neurons correspond to consistent and distinct semantics, have significantly improved interpretability.
We show that monosemantic features not only enhance interpretability but also bring concrete gains in model performance.
arXiv Detail & Related papers (2024-10-27T18:03:20Z) - Don't Cut Corners: Exact Conditions for Modularity in Biologically Inspired Representations [52.48094670415497]
We develop a theory of when biologically inspired representations modularise with respect to source variables (sources)
We derive necessary and sufficient conditions on a sample of sources that determine whether the neurons in an optimal biologically-inspired linear autoencoder modularise.
Our theory applies to any dataset, extending far beyond the case of statistical independence studied in previous work.
arXiv Detail & Related papers (2024-10-08T17:41:37Z) - Encourage or Inhibit Monosemanticity? Revisit Monosemanticity from a Feature Decorrelation Perspective [30.290777756014748]
A monosemantic neuron is dedicated to a single and specific concept, which forms a one-to-one correlation between neurons and concepts.
Despite extensive research in monosemanticity probing, it remains unclear whether monosemanticity is beneficial or harmful to model capacity.
arXiv Detail & Related papers (2024-06-25T22:51:08Z) - Neuroformer: Multimodal and Multitask Generative Pretraining for Brain Data [3.46029409929709]
State-of-the-art systems neuroscience experiments yield large-scale multimodal data, and these data sets require new tools for analysis.
Inspired by the success of large pretrained models in vision and language domains, we reframe the analysis of large-scale, cellular-resolution neuronal spiking data into an autoregressive generation problem.
We first trained Neuroformer on simulated datasets, and found that it both accurately predicted intrinsically simulated neuronal circuit activity, and also inferred the underlying neural circuit connectivity, including direction.
arXiv Detail & Related papers (2023-10-31T20:17:32Z) - Decorrelating neurons using persistence [29.25969187808722]
We present two regularisation terms computed from the weights of a minimum spanning tree of a clique.
We demonstrate that naive minimisation of all correlations between neurons obtains lower accuracies than our regularisation terms.
We include a proof of differentiability of our regularisers, thus developing the first effective topological persistence-based regularisation terms.
arXiv Detail & Related papers (2023-08-09T11:09:14Z) - Cross-Model Comparative Loss for Enhancing Neuronal Utility in Language
Understanding [82.46024259137823]
We propose a cross-model comparative loss for a broad range of tasks.
We demonstrate the universal effectiveness of comparative loss through extensive experiments on 14 datasets from 3 distinct NLU tasks.
arXiv Detail & Related papers (2023-01-10T03:04:27Z) - Engineering Monosemanticity in Toy Models [0.1474723404975345]
In some neural networks, individual neurons correspond to natural features'' in the input.
We find that models can be made more monosemantic without increasing the loss by just changing which local minimum the training process finds.
We are able to mechanistically interpret these models, including the residual polysemantic neurons, and uncover a simple yet surprising algorithm.
arXiv Detail & Related papers (2022-11-16T19:32:43Z) - Overcoming the Domain Gap in Contrastive Learning of Neural Action
Representations [60.47807856873544]
A fundamental goal in neuroscience is to understand the relationship between neural activity and behavior.
We generated a new multimodal dataset consisting of the spontaneous behaviors generated by fruit flies.
This dataset and our new set of augmentations promise to accelerate the application of self-supervised learning methods in neuroscience.
arXiv Detail & Related papers (2021-11-29T15:27:51Z) - And/or trade-off in artificial neurons: impact on adversarial robustness [91.3755431537592]
Presence of sufficient number of OR-like neurons in a network can lead to classification brittleness and increased vulnerability to adversarial attacks.
We define AND-like neurons and propose measures to increase their proportion in the network.
Experimental results on the MNIST dataset suggest that our approach holds promise as a direction for further exploration.
arXiv Detail & Related papers (2021-02-15T08:19:05Z) - The Neural Coding Framework for Learning Generative Models [91.0357317238509]
We propose a novel neural generative model inspired by the theory of predictive processing in the brain.
In a similar way, artificial neurons in our generative model predict what neighboring neurons will do, and adjust their parameters based on how well the predictions matched reality.
arXiv Detail & Related papers (2020-12-07T01:20:38Z) - One Neuron to Fool Them All [12.107259467873094]
We evaluate the sensitivity of individual neurons in terms of how robust the model's output is to direct perturbations of that neuron's output.
Attacks using a loss function that targets just a single sensitive neuron find adversarial examples nearly as effectively as ones that target the full model.
arXiv Detail & Related papers (2020-03-20T16:49:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.