Superposition as Lossy Compression: Measure with Sparse Autoencoders and Connect to Adversarial Vulnerability
- URL: http://arxiv.org/abs/2512.13568v1
- Date: Mon, 15 Dec 2025 17:25:39 GMT
- Title: Superposition as Lossy Compression: Measure with Sparse Autoencoders and Connect to Adversarial Vulnerability
- Authors: Leonard Bereska, Zoe Tzifa-Kratira, Reza Samavi, Efstratios Gavves,
- Abstract summary: We present an information-theoretic framework measuring a neural representation's effective degrees of freedom.<n>We apply Shannon entropy to sparse autoencoder activations to compute the number of effective features as the minimum neurons needed for interference-free encoding.<n>Our metric strongly correlates with ground truth in toy models, detects minimal superposition in algorithmic tasks, and reveals systematic reduction under dropout.
- Score: 23.64717985754014
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Neural networks achieve remarkable performance through superposition: encoding multiple features as overlapping directions in activation space rather than dedicating individual neurons to each feature. This challenges interpretability, yet we lack principled methods to measure superposition. We present an information-theoretic framework measuring a neural representation's effective degrees of freedom. We apply Shannon entropy to sparse autoencoder activations to compute the number of effective features as the minimum neurons needed for interference-free encoding. Equivalently, this measures how many "virtual neurons" the network simulates through superposition. When networks encode more effective features than actual neurons, they must accept interference as the price of compression. Our metric strongly correlates with ground truth in toy models, detects minimal superposition in algorithmic tasks, and reveals systematic reduction under dropout. Layer-wise patterns mirror intrinsic dimensionality studies on Pythia-70M. The metric also captures developmental dynamics, detecting sharp feature consolidation during grokking. Surprisingly, adversarial training can increase effective features while improving robustness, contradicting the hypothesis that superposition causes vulnerability. Instead, the effect depends on task complexity and network capacity: simple tasks with ample capacity allow feature expansion (abundance regime), while complex tasks or limited capacity force reduction (scarcity regime). By defining superposition as lossy compression, this work enables principled measurement of how neural networks organize information under computational constraints, connecting superposition to adversarial robustness.
Related papers
- Weight transport through spike timing for robust local gradients [0.5236468296934584]
plasticity in functional neural networks is frequently expressed as gradient descent on a cost.<n>This imposes symmetry constraints that are difficult to reconcile with local computation.<n>We introduce spike-based alignment learning, which uses spike timing statistics to extract and correct the asymmetry between effective reciprocal connections.
arXiv Detail & Related papers (2025-03-04T14:05:39Z) - A spiking photonic neural network of 40.000 neurons, trained with rank-order coding for leveraging sparsity [0.2386578882591578]
Spiking neural networks are neuromorphic systems that emulate certain aspects of biological neurons.<n>This demonstration integrates photonic nonlinearity, excitability, and sparse computation, paving the way for efficient large-scale photonic neuromorphic systems.
arXiv Detail & Related papers (2024-11-28T15:28:30Z) - Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Efficient and Flexible Method for Reducing Moderate-size Deep Neural Networks with Condensation [36.41451383422967]
In scientific applications, the scale of neural networks is generally moderate-size, mainly to ensure the speed of inference.
Existing work has found that the powerful capabilities of neural networks are primarily due to their non-linearity.
We propose a condensation reduction algorithm to verify the feasibility of this idea in practical problems.
arXiv Detail & Related papers (2024-05-02T06:53:40Z) - Fully Spiking Actor Network with Intra-layer Connections for
Reinforcement Learning [51.386945803485084]
We focus on the task where the agent needs to learn multi-dimensional deterministic policies to control.
Most existing spike-based RL methods take the firing rate as the output of SNNs, and convert it to represent continuous action space (i.e., the deterministic policy) through a fully-connected layer.
To develop a fully spiking actor network without any floating-point matrix operations, we draw inspiration from the non-spiking interneurons found in insects.
arXiv Detail & Related papers (2024-01-09T07:31:34Z) - Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence.
We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers.
This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z) - Impact of spiking neurons leakages and network recurrences on
event-based spatio-temporal pattern recognition [0.0]
Spiking neural networks coupled with neuromorphic hardware and event-based sensors are getting increased interest for low-latency and low-power inference at the edge.
We explore the impact of synaptic and membrane leakages in spiking neurons.
arXiv Detail & Related papers (2022-11-14T21:34:02Z) - On the Adversarial Robustness of Quantized Neural Networks [2.0625936401496237]
It is unclear how model compression techniques may affect the robustness of AI algorithms against adversarial attacks.
This paper explores the effect of quantization, one of the most common compression techniques, on the adversarial robustness of neural networks.
arXiv Detail & Related papers (2021-05-01T11:46:35Z) - And/or trade-off in artificial neurons: impact on adversarial robustness [91.3755431537592]
Presence of sufficient number of OR-like neurons in a network can lead to classification brittleness and increased vulnerability to adversarial attacks.
We define AND-like neurons and propose measures to increase their proportion in the network.
Experimental results on the MNIST dataset suggest that our approach holds promise as a direction for further exploration.
arXiv Detail & Related papers (2021-02-15T08:19:05Z) - The Connection Between Approximation, Depth Separation and Learnability
in Neural Networks [70.55686685872008]
We study the connection between learnability and approximation capacity.
We show that learnability with deep networks of a target function depends on the ability of simpler classes to approximate the target.
arXiv Detail & Related papers (2021-01-31T11:32:30Z) - Neural Sparse Representation for Image Restoration [116.72107034624344]
Inspired by the robustness and efficiency of sparse coding based image restoration models, we investigate the sparsity of neurons in deep networks.
Our method structurally enforces sparsity constraints upon hidden neurons.
Experiments show that sparse representation is crucial in deep neural networks for multiple image restoration tasks.
arXiv Detail & Related papers (2020-06-08T05:15:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.