Related papers: Polysemanticity and Capacity in Neural Networks

Polysemanticity and Capacity in Neural Networks

URL: http://arxiv.org/abs/2210.01892v4
Date: Tue, 25 Mar 2025 05:19:03 GMT
Title: Polysemanticity and Capacity in Neural Networks
Authors: Adam Scherlis, Kshitij Sachan, Adam S. Jermyn, Joe Benton, Buck Shlegeris,
Abstract summary: Individual neurons in neural networks often represent a mixture of unrelated features.<n>This phenomenon, called polysemanticity, can make interpreting neural networks more difficult.
Score: 2.9260206957981167
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Individual neurons in neural networks often represent a mixture of unrelated features. This phenomenon, called polysemanticity, can make interpreting neural networks more difficult and so we aim to understand its causes. We propose doing so through the lens of feature \emph{capacity}, which is the fractional dimension each feature consumes in the embedding space. We show that in a toy model the optimal capacity allocation tends to monosemantically represent the most important features, polysemantically represent less important features (in proportion to their impact on the loss), and entirely ignore the least important features. Polysemanticity is more prevalent when the inputs have higher kurtosis or sparsity and more prevalent in some architectures than others. Given an optimal allocation of capacity, we go on to study the geometry of the embedding space. We find a block-semi-orthogonal structure, with differing block sizes in different models, highlighting the impact of model architecture on the interpretability of its neurons.

Related papers

Beyond Scaling Curves: Internal Dynamics of Neural Networks Through the NTK Lens [0.5745241788717261]
We empirically analyze how neural networks behave under data and model scaling through the lens of the neural tangent kernel (NTK)<n>Our findings of standard vision tasks show that similar performance scaling exponents can occur even though the internal model dynamics show opposite behavior.<n>We also address a previously unresolved issue in neural scaling: how convergence to the infinite-width limit affects scaling behavior in finite-width models.
arXiv Detail & Related papers (2025-07-07T14:17:44Z)
Probing the Vulnerability of Large Language Models to Polysemantic Interventions [49.64902130083662]
We investigate the polysemantic structure of two small models (Pythia-70M and GPT-2-Small)<n>Our analysis reveals a consistent polysemantic topology shared across both models.<n>Strikingly, we demonstrate that this structure can be exploited to mount effective interventions on two larger, black-box instruction-tuned models.
arXiv Detail & Related papers (2025-05-16T18:20:42Z)
Plastic Arbor: a modern simulation framework for synaptic plasticity $\unicode{x2013}$ from single synapses to networks of morphological neurons [0.8796261172196743]
In humans and other animals, synaptic plasticity processes play a vital role in cognitive functions, including learning and memory. Recent studies have shown that intracellular molecular processes in dendrites significantly influence single-neuron dynamics. We have extended the Arbor library to the Plastic Arbor framework, supporting simulations of a large variety of spike-driven plasticity paradigms.
arXiv Detail & Related papers (2024-11-25T14:51:13Z)
Semantic Loss Functions for Neuro-Symbolic Structured Prediction [74.18322585177832]
We discuss the semantic loss, which injects knowledge about such structure, defined symbolically, into training. It is agnostic to the arrangement of the symbols, and depends only on the semantics expressed thereby. It can be combined with both discriminative and generative neural models.
arXiv Detail & Related papers (2024-05-12T22:18:25Z)
Towards Explaining Hypercomplex Neural Networks [6.543091030789653]
Hypercomplex neural networks are gaining increasing interest in the deep learning community. In this paper, we propose inherently interpretable PHNNs and quaternion-like networks. We draw insights into how this unique branch of neural models operates.
arXiv Detail & Related papers (2024-03-26T17:58:07Z)
Asymptotics of Learning with Deep Structured (Random) Features [9.366617422860543]
For a large class of feature maps we provide a tight characterisation of the test error associated with learning the readout layer. In some cases our results can capture feature maps learned by deep, finite-width neural networks trained under gradient descent.
arXiv Detail & Related papers (2024-02-21T18:35:27Z)
What Causes Polysemanticity? An Alternative Origin Story of Mixed Selectivity from Incidental Causes [14.623741848860037]
Polysemantic neurons -- neurons that activate for a set of unrelated features -- have been seen as a significant obstacle towards interpretability of task-optimized deep networks. We show that polysemanticity can arise incidentally, even when there are ample neurons to represent all features in the data.
arXiv Detail & Related papers (2023-12-05T19:29:54Z)
Heterogeneous Feature Representation for Digital Twin-Oriented Complex Networked Systems [13.28255056212425]
Building models of Complex Networked Systems that can accurately represent reality forms an important research area. This study aims to improve the expressive power of node features in Digital Twin-Oriented Complex Networked Systems.
arXiv Detail & Related papers (2023-09-23T01:40:56Z)
Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules. inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z)
The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability. We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z)
The Causal Neural Connection: Expressiveness, Learnability, and Inference [125.57815987218756]
An object called structural causal model (SCM) represents a collection of mechanisms and sources of random variation of the system under investigation. In this paper, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020) still holds for neural models. We introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences.
arXiv Detail & Related papers (2021-07-02T01:55:18Z)
Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis. By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner. This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z)
Hyperbolic Neural Networks++ [66.16106727715061]
We generalize the fundamental components of neural networks in a single hyperbolic geometry model, namely, the Poincar'e ball model. Experiments show the superior parameter efficiency of our methods compared to conventional hyperbolic components, and stability and outperformance over their Euclidean counterparts.
arXiv Detail & Related papers (2020-06-15T08:23:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.