Related papers: How Many Features Can a Language Model Store Under the Linear Representation Hypothesis?

How Many Features Can a Language Model Store Under the Linear Representation Hypothesis?

URL: http://arxiv.org/abs/2602.11246v1
Date: Wed, 11 Feb 2026 17:49:32 GMT
Title: How Many Features Can a Language Model Store Under the Linear Representation Hypothesis?
Authors: Nikhil Garg, Jon Kleinberg, Kenny Peng,
Abstract summary: We introduce a mathematical framework for the linear representation hypothesis (LRH)<n>LRH asserts that intermediate layers of language models store features linearly.<n>We prove that $d = O_(frack2log klog (m/k))$ is required while $d = O_(k2log m)$ suffices.
Score: 8.283029791278187
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce a mathematical framework for the linear representation hypothesis (LRH), which asserts that intermediate layers of language models store features linearly. We separate the hypothesis into two claims: linear representation (features are linearly embedded in neuron activations) and linear accessibility (features can be linearly decoded). We then ask: How many neurons $d$ suffice to both linearly represent and linearly access $m$ features? Classical results in compressed sensing imply that for $k$-sparse inputs, $d = O(k\log (m/k))$ suffices if we allow non-linear decoding algorithms (Candes and Tao, 2006; Candes et al., 2006; Donoho, 2006). However, the additional requirement of linear decoding takes the problem out of the classical compressed sensing, into linear compressed sensing. Our main theoretical result establishes nearly-matching upper and lower bounds for linear compressed sensing. We prove that $d = Ω_ε(\frac{k^2}{\log k}\log (m/k))$ is required while $d = O_ε(k^2\log m)$ suffices. The lower bound establishes a quantitative gap between classical and linear compressed setting, illustrating how linear accessibility is a meaningfully stronger hypothesis than linear representation alone. The upper bound confirms that neurons can store an exponential number of features under the LRH, giving theoretical evidence for the "superposition hypothesis" (Elhage et al., 2022). The upper bound proof uses standard random constructions of matrices with approximately orthogonal columns. The lower bound proof uses rank bounds for near-identity matrices (Alon, 2003) together with Turán's theorem (bounding the number of edges in clique-free graphs). We also show how our results do and do not constrain the geometry of feature representations and extend our results to allow decoders with an activation function and bias.

Related papers

Who Said Neural Networks Aren't Linear? [10.340966855587405]
This paper introduces a method that makes such vector spaces explicit by construction.<n>We find that if we sandwich a linear operator $A$ between two invertible neural networks, $f(x)=g_y-1(A g_x(x))$, then the corresponding vector spaces $X$ and $Y$ are induced by newly defined addition and scaling actions.
arXiv Detail & Related papers (2025-10-09T17:59:57Z)
Explicit Discovery of Nonlinear Symmetries from Dynamic Data [50.20526548924647]
LieNLSD is the first method capable of determining the number of infinitesimal generators with nonlinear terms and their explicit expressions.<n>LieNLSD shows qualitative advantages over existing methods and improves the long rollout accuracy of neural PDE solvers by over 20%.
arXiv Detail & Related papers (2025-10-02T09:54:08Z)
Fidelity Isn't Accuracy: When Linearly Decodable Functions Fail to Match the Ground Truth [0.0]
A linearity score $lambda(f)$ measures how well a regression network's output can be mimicked by a linear model.<n>This framework is evaluated on both synthetic and real-world datasets.
arXiv Detail & Related papers (2025-06-13T18:55:37Z)
Geometry of fibers of the multiplication map of deep linear neural networks [0.0]
We study the geometry of the set of quivers of composable matrices which multiply to a fixed matrix.<n>Our solution is presented in three forms: a Poincar'e series in equivariant cohomology, a quadratic integer program, and an explicit formula.
arXiv Detail & Related papers (2024-11-29T18:36:03Z)
Neural Networks and (Virtual) Extended Formulations [8.185918509343818]
We prove lower bounds on the size of neural networks that optimize over $P$.<n>We show that $mathrmxc(P)$ is a lower bound on the size of any monotone or input neural network that solves the linear optimization problem over $P$.
arXiv Detail & Related papers (2024-11-05T11:12:11Z)
Data subsampling for Poisson regression with pth-root-link [53.63838219437508]
We develop and analyze data subsampling techniques for Poisson regression. In particular, we consider the Poisson generalized linear model with ID- and square root-link functions.
arXiv Detail & Related papers (2024-10-30T10:09:05Z)
Non-convex matrix sensing: Breaking the quadratic rank barrier in the sample complexity [11.412228884390784]
We show that factorized gradient descent scales to the truth at the number of samples.<n>We extend our theory to the noisy setting, where we show that with noisy measurements the gradient descents are only weakly dependent on the measurement matrices.
arXiv Detail & Related papers (2024-08-20T14:09:28Z)
Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks. In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z)
Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation [25.60689712525918]
We study reinforcement learning with linear function approximation where the transition probability and reward functions are linear. We propose a novel-efficient algorithm, LSVI-UCB$+$, which achieves an $widetildeO(HdsqrtT)$ regret bound where $H$ is the episode length, $d$ is the feature dimension, and $T$ is the number of steps.
arXiv Detail & Related papers (2022-06-23T06:04:21Z)
Minimax Optimal Quantization of Linear Models: Information-Theoretic Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements. We derive an information-theoretic lower bound for the minimax risk under this setting. We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z)
Learning Sparse Graph Laplacian with K Eigenvector Prior via Iterative GLASSO and Projection [58.5350491065936]
We consider a structural assumption on the graph Laplacian matrix $L$. The first $K$ eigenvectors of $L$ are pre-selected, e.g., based on domain-specific criteria. We design an efficient hybrid graphical lasso/projection algorithm to compute the most suitable graph Laplacian matrix $L* in H_u+$ given $barC$.
arXiv Detail & Related papers (2020-10-25T18:12:50Z)
Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
We show that learning $ktimes k$, rank-$r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ samples. We propose an algorithm that uses $cal O(frackrepsilon2log2fracepsilon)$ samples, a number linear in the high dimension, and nearly linear in the matrices, typically low, rank proofs.
arXiv Detail & Related papers (2020-09-30T19:10:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.