Attention Layers Add Into Low-Dimensional Residual Subspaces
- URL: http://arxiv.org/abs/2508.16929v2
- Date: Sun, 28 Sep 2025 11:00:01 GMT
- Title: Attention Layers Add Into Low-Dimensional Residual Subspaces
- Authors: Junxuan Wang, Xuyang Ge, Wentao Shu, Zhengfu He, Xipeng Qiu,
- Abstract summary: We show that attention outputs are confined to a surprisingly low-dimensional subspace.<n>We find this low-rank structure as a key factor of the prevalent dead feature problem in sparse dictionary learning.
- Score: 46.25442191251545
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Transformer architectures, and their attention mechanisms in particular, form the foundation of modern large language models. While transformer models are widely believed to operate in high-dimensional hidden spaces, we show that attention outputs are confined to a surprisingly low-dimensional subspace, where about 60\% of the directions account for 99\% of the variance--a phenomenon that is consistently observed across diverse model families and datasets, and is induced by the attention output projection matrix. Critically, we find this low-rank structure as a key factor of the prevalent dead feature problem in sparse dictionary learning, where it creates a mismatch between randomly initialized features and the intrinsic geometry of the activation space. Building on this insight, we propose a subspace-constrained training method for sparse autoencoders (SAEs), initializing feature directions into the active subspace of activations. Our approach reduces dead features from 87\% to below 1\% in Attention Output SAEs with 1M features, and can further extend to other sparse dictionary learning methods. Our findings provide both new insights into the geometry of attention and practical tools for improving sparse dictionary learning in large language models.
Related papers
- ShortcutBreaker: Low-Rank Noisy Bottleneck with Global Perturbation Attention for Multi-Class Unsupervised Anomaly Detection [59.89803740308262]
ShortcutBreaker is a novel unified feature-reconstruction framework for MUAD tasks.<n>It features two key innovations to address the issue of shortcuts.<n>The proposed method achieves a remarkable image-level AUROC of 99.8%, 98.9%, 90.6%, and 87.8% on four datasets.
arXiv Detail & Related papers (2025-10-21T06:51:30Z) - Understanding In-context Learning of Addition via Activation Subspaces [74.8874431046062]
We study a family of few-shot learning tasks for which the true prediction rule is to add an integer $k$ to the input.<n>We find that Llama-3-8B attains high accuracy on this task for a range of $k$, and localizes its few-shot ability to just three attention heads.
arXiv Detail & Related papers (2025-05-08T11:32:46Z) - The Space Between: On Folding, Symmetries and Sampling [4.16445550760248]
We propose a space folding measure based on Hamming distance in the ReLU activation space.<n>We show that space folding values increase with network depth when the generalization error is low, but decrease when the error increases.<n>Inspired by these findings, we outline a novel regularization scheme that encourages the network to seek solutions characterized by higher folding values.
arXiv Detail & Related papers (2025-03-11T14:54:25Z) - Preventing Dimensional Collapse in Self-Supervised Learning via Orthogonality Regularization [9.823816643319448]
Self-supervised learning (SSL) has rapidly advanced in recent years, approaching the performance of its supervised counterparts.
dimensional collapse, where a few large eigenvalues dominate the eigenspace, poses a significant obstacle for SSL.
arXiv Detail & Related papers (2024-11-01T06:39:18Z) - Anti-Collapse Loss for Deep Metric Learning Based on Coding Rate Metric [99.19559537966538]
DML aims to learn a discriminative high-dimensional embedding space for downstream tasks like classification, clustering, and retrieval.
To maintain the structure of embedding space and avoid feature collapse, we propose a novel loss function called Anti-Collapse Loss.
Comprehensive experiments on benchmark datasets demonstrate that our proposed method outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2024-07-03T13:44:20Z) - Explicitly Disentangled Representations in Object-Centric Learning [0.0]
We propose a novel architecture that biases object-centric models toward disentangling shape and texture components.<n>In particular, we propose a novel architecture that biases object-centric models toward disentangling shape and texture components.
arXiv Detail & Related papers (2024-01-18T17:22:11Z) - Subspace-Guided Feature Reconstruction for Unsupervised Anomaly
Localization [5.085309164633571]
Unsupervised anomaly localization plays a critical role in industrial manufacturing.
Most recent methods perform feature matching or reconstruction for the target sample with pre-trained deep neural networks.
We propose a novel subspace-guided feature reconstruction framework to pursue adaptive feature approximation for anomaly localization.
arXiv Detail & Related papers (2023-09-25T06:58:57Z) - Language Models Implement Simple Word2Vec-style Vector Arithmetic [32.2976613483151]
A primary criticism towards language models (LMs) is their inscrutability.
This paper presents evidence that, despite their size and complexity, LMs sometimes exploit a simple vector arithmetic style mechanism to solve some relational tasks.
arXiv Detail & Related papers (2023-05-25T15:04:01Z) - Indirect-Instant Attention Optimization for Crowd Counting in Dense
Scenes [3.8950254639440094]
Indirect-Instant Attention Optimization (IIAO) module based on SoftMax-Attention.
Special transformation yields relatively coarse features and, originally, the predictive fallibility of regions varies by crowd density distribution.
We tailor the Regional Correlation Loss (RCLoss) to retrieve continuous error-prone regions and smooth spatial information.
arXiv Detail & Related papers (2022-06-12T03:29:50Z) - Exploring Dimensionality Reduction Techniques in Multilingual
Transformers [64.78260098263489]
This paper gives a comprehensive account of the impact of dimensional reduction techniques on the performance of state-of-the-art multilingual Siamese Transformers.
It shows that it is possible to achieve an average reduction in the number of dimensions of $91.58% pm 2.59%$ and $54.65% pm 32.20%$, respectively.
arXiv Detail & Related papers (2022-04-18T17:20:55Z) - Regressive Domain Adaptation for Unsupervised Keypoint Detection [67.2950306888855]
Domain adaptation (DA) aims at transferring knowledge from a labeled source domain to an unlabeled target domain.
We present a method of regressive domain adaptation (RegDA) for unsupervised keypoint detection.
Our method brings large improvement by 8% to 11% in terms of PCK on different datasets.
arXiv Detail & Related papers (2021-03-10T16:45:22Z) - Introducing Orthogonal Constraint in Structural Probes [0.2538209532048867]
We decompose a linear projection of language vector space into isomorphic space rotation and linear scaling directions.
We experimentally show that our approach can be performed in a multitask setting.
arXiv Detail & Related papers (2020-12-30T17:14:25Z) - Discrete Variational Attention Models for Language Generation [51.88612022940496]
We propose a discrete variational attention model with categorical distribution over the attention mechanism owing to the discrete nature in languages.
Thanks to the property of discreteness, the training of our proposed approach does not suffer from posterior collapse.
arXiv Detail & Related papers (2020-04-21T05:49:04Z) - Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies [60.285091454321055]
We design a simple and efficient embedding algorithm that learns a small set of anchor embeddings and a sparse transformation matrix.
On text classification, language modeling, and movie recommendation benchmarks, we show that ANT is particularly suitable for large vocabulary sizes.
arXiv Detail & Related papers (2020-03-18T13:07:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.