Related papers: Large Language Models are Locally Linear Mappings

Large Language Models are Locally Linear Mappings

URL: http://arxiv.org/abs/2505.24293v2
Date: Wed, 04 Jun 2025 03:50:57 GMT
Title: Large Language Models are Locally Linear Mappings
Authors: James R. Golden,
Abstract summary: We map the inference operations of several open-weight large language models to an exactly equivalent linear system for an input sequence.<n>Despite their power and global nonlinearity, modern LLMs can be interpreted through nearly-exact locally linear decompositions.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We demonstrate that the inference operations of several open-weight large language models (LLMs) can be mapped to an exactly equivalent linear system for an input sequence without modifying the model weights or altering output predictions. Extending techniques from image diffusion models that exhibit local or piecewise linearity, we strategically alter the gradient computation with respect to a given input sequence for a next-token prediction such that the Jacobian of the model nearly exactly reproduces the forward prediction with a linear system. We demonstrate this approach across models (Llama 3, Gemma 3, Qwen 3, Phi 4, Mistral Ministral and OLMo 2, up to Llama 3.3 70B Q4) and show through the singular value decomposition of the detached Jacobian that these LLMs operate in extremely low-dimensional subspaces where many of the largest singular vectors decode to concepts related to the most-likely output token. This approach also allows us to examine the operation of each successive layer (and its attention and MLP components) as nearly-exact linear systems and observe the emergence of semantic concepts. Despite their expressive power and global nonlinearity, modern LLMs can be interpreted through nearly-exact locally linear decompositions that provide insights into their internal representations and reveal interpretable semantic structures in the next-token prediction process.

Related papers

Sequential-Parallel Duality in Prefix Scannable Models [68.39855814099997]
Recent developments have given rise to various models, such as Gated Linear Attention (GLA) and Mamba.<n>This raises a natural question: can we characterize the full class of neural sequence models that support near-constant-time parallel evaluation and linear-time, constant-space sequential inference?
arXiv Detail & Related papers (2025-06-12T17:32:02Z)
How can representation dimension dominate structurally pruned LLMs? [17.953689537875377]
Pruning assumes a subnetwork exists in the original deep neural network.<n>It is unclear how the model performance varies with the different subnetwork extractions.
arXiv Detail & Related papers (2025-03-06T12:28:59Z)
Large Language-Geometry Model: When LLM meets Equivariance [53.8505081745406]
We propose EquiLLM, a novel framework for representing 3D physical systems.<n>We show that EquiLLM delivers significant improvements over previous methods across molecular dynamics simulation, human motion simulation, and antibody design.
arXiv Detail & Related papers (2025-02-16T14:50:49Z)
Demystifying Singular Defects in Large Language Models [61.98878352956125]
In large language models (LLMs), the underlying causes of high-norm tokens remain largely unexplored.<n>We provide both theoretical insights and empirical validation across a range of recent models.<n>We showcase two practical applications of these findings: the improvement of quantization schemes and the design of LLM signatures.
arXiv Detail & Related papers (2025-02-10T20:09:16Z)
Predicting the Performance of Black-box LLMs through Self-Queries [60.87193950962585]
Large language models (LLMs) are increasingly relied on in AI systems, predicting when they make mistakes is crucial.<n>In this paper, we extract features of LLMs in a black-box manner by using follow-up prompts and taking the probabilities of different responses as representations.<n>We demonstrate that training a linear model on these low-dimensional representations produces reliable predictors of model performance at the instance level.
arXiv Detail & Related papers (2025-01-02T22:26:54Z)
Pushing the Limits of Large Language Model Quantization via the Linearity Theorem [71.3332971315821]
We present a "line theoremarity" establishing a direct relationship between the layer-wise $ell$ reconstruction error and the model perplexity increase due to quantization. This insight enables two novel applications: (1) a simple data-free LLM quantization method using Hadamard rotations and MSE-optimal grids, dubbed HIGGS, and (2) an optimal solution to the problem of finding non-uniform per-layer quantization levels.
arXiv Detail & Related papers (2024-11-26T15:35:44Z)
Transformer Block Coupling and its Correlation with Generalization in LLMs [3.007031501305338]
We analyze the trajectories of token embeddings as they pass through transformer blocks, linearizing the system along these trajectories through their Jacobian matrices.<n>We uncover the phenomenon of textbftransformer block coupling in a multitude of Large Language Models, characterized by the coupling of their top singular vectors across tokens and depth.<n>We further investigate how these properties emerge during training, observing a progressive development of coupling, increased linearity, and layer-wise exponential growth in token trajectories.
arXiv Detail & Related papers (2024-07-10T16:30:27Z)
Weight-based Decomposition: A Case for Bilinear MLPs [0.0]
Gated Linear Units (GLUs) have become a common building block in modern foundation models. Bilinear layers drop the non-linearity in the "gate" but still have comparable performance to other GLUs. We develop a method to decompose the bilinear tensor into a set of interacting eigenvectors.
arXiv Detail & Related papers (2024-06-06T10:46:51Z)
Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective [26.479602180023125]
The Linear Complexity Sequence Model (LCSM) unites various sequence modeling techniques with linear complexity. We segment the modeling processes of these models into three distinct stages: Expand, Oscillation, and Shrink. We perform experiments to analyze the impact of different stage settings on language modeling and retrieval tasks.
arXiv Detail & Related papers (2024-05-27T17:38:55Z)
On the Origins of Linear Representations in Large Language Models [51.88404605700344]
We introduce a simple latent variable model to formalize the concept dynamics of the next token prediction. Experiments show that linear representations emerge when learning from data matching the latent variable model. We additionally confirm some predictions of the theory using the LLaMA-2 large language model.
arXiv Detail & Related papers (2024-03-06T17:17:36Z)
Semi-Supervised Learning with Normalizing Flows [54.376602201489995]
FlowGMM is an end-to-end approach to generative semi supervised learning with normalizing flows. We show promising results on a wide range of applications, including AG-News and Yahoo Answers text data.
arXiv Detail & Related papers (2019-12-30T17:36:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.