Related papers: Transformers converge to invariant algorithmic cores

Transformers converge to invariant algorithmic cores

URL: http://arxiv.org/abs/2602.22600v1
Date: Thu, 26 Feb 2026 04:09:11 GMT
Title: Transformers converge to invariant algorithmic cores
Authors: Joshua S. Schiffman,
Abstract summary: GPT-2 language models govern subject-verb agreement through a single axis that, when flipped, inverts grammatical number across scales.<n>Mechanistic interpretability could benefit from targeting such invariants -- the computational essence -- rather than implementation-specific details.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models exhibit sophisticated capabilities, yet understanding how they work internally remains a central challenge. A fundamental obstacle is that training selects for behavior, not circuitry, so many weight configurations can implement the same function. Which internal structures reflect the computation, and which are accidents of a particular training run? This work extracts algorithmic cores: compact subspaces necessary and sufficient for task performance. Independently trained transformers learn different weights but converge to the same cores. Markov-chain transformers embed 3D cores in nearly orthogonal subspaces yet recover identical transition spectra. Modular-addition transformers discover compact cyclic operators at grokking that later inflate, yielding a predictive model of the memorization-to-generalization transition. GPT-2 language models govern subject-verb agreement through a single axis that, when flipped, inverts grammatical number throughout generation across scales. These results reveal low-dimensional invariants that persist across training runs and scales, suggesting that transformer computations are organized around compact, shared algorithmic structures. Mechanistic interpretability could benefit from targeting such invariants -- the computational essence -- rather than implementation-specific details.

Related papers

Low-Dimensional Execution Manifolds in Transformer Learning Dynamics: Evidence from Modular Arithmetic Tasks [0.0]
We investigate the structure of learning dynamics in transformer models through carefully controlled arithmetic tasks.<n>Our results suggest a unifying geometric framework for understanding transformer learning.
arXiv Detail & Related papers (2026-02-11T03:57:46Z)
Beyond Components: Singular Vector-Based Interpretability of Transformer Circuits [22.333229451408414]
Transformer-based language models exhibit complex and distributed behavior, yet their internal computations remain poorly understood.<n>Existing interpretability methods treat attention heads and multilayer perceptron layers (MLPs) as indivisible units, overlooking possibilities of functional substructure learned within them.<n>We introduce a more fine-grained perspective that decomposes these components into singular directions, revealing superposed and independent computations within a single head or mechanistic.
arXiv Detail & Related papers (2025-11-25T12:59:15Z)
Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights [47.62295798627317]
This work establishes a theoretical foundation by analyzing the performance of transformers for regression tasks involving noisy input data on a manifold.<n>We prove approximation and generalization errors which crucially depend on the intrinsic dimension of the manifold.<n>Our results demonstrate that transformers can leverage low-complexity structures in learning task even when the input data are perturbed by high-dimensional noise.
arXiv Detail & Related papers (2025-05-06T05:41:46Z)
Algorithmic Capabilities of Random Transformers [49.73113518329544]
We investigate what functions can be learned by randomly transformers in which only the embedding layers are optimized. We find that these random transformers can perform a wide range of meaningful algorithmic tasks. Our results indicate that some algorithmic capabilities are present in transformers even before these models are trained.
arXiv Detail & Related papers (2024-10-06T06:04:23Z)
Transolver: A Fast Transformer Solver for PDEs on General Geometries [66.82060415622871]
We present Transolver, which learns intrinsic physical states hidden behind discretized geometries. By calculating attention to physics-aware tokens encoded from slices, Transovler can effectively capture intricate physical correlations. Transolver achieves consistent state-of-the-art with 22% relative gain across six standard benchmarks and also excels in large-scale industrial simulations.
arXiv Detail & Related papers (2024-02-04T06:37:38Z)
Analyzing Transformer Dynamics as Movement through Embedding Space [0.0]
This paper explores how Transformer based language models exhibit intelligent behaviors such as understanding natural language. We propose framing Transformer dynamics as movement through embedding space.
arXiv Detail & Related papers (2023-08-21T17:21:23Z)
Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection [88.23337313766353]
This work first provides a comprehensive statistical theory for transformers to perform ICL. We show that transformers can implement a broad class of standard machine learning algorithms in context. A emphsingle transformer can adaptively select different base ICL algorithms.
arXiv Detail & Related papers (2023-06-07T17:59:31Z)
Transformers learn in-context by gradient descent [58.24152335931036]
Training Transformers on auto-regressive objectives is closely related to gradient-based meta-learning formulations. We show how trained Transformers become mesa-optimizers i.e. learn models by gradient descent in their forward pass.
arXiv Detail & Related papers (2022-12-15T09:21:21Z)
Characterizing Intrinsic Compositionality in Transformers with Tree Projections [72.45375959893218]
neural models like transformers can route information arbitrarily between different parts of their input. We show that transformers for three different tasks become more treelike over the course of training. These trees are predictive of model behavior, with more tree-like models generalizing better on tests of compositional generalization.
arXiv Detail & Related papers (2022-11-02T17:10:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.