The Origins of Representation Manifolds in Large Language Models
- URL: http://arxiv.org/abs/2505.18235v1
- Date: Fri, 23 May 2025 13:31:22 GMT
- Title: The Origins of Representation Manifolds in Large Language Models
- Authors: Alexander Modell, Patrick Rubin-Delanchy, Nick Whiteley,
- Abstract summary: We show that cosine similarity in representation space may encode the intrinsic geometry of a feature through shortest, on-manifold paths.<n>The critical assumptions and predictions of the theory are validated on text embeddings and token activations of large language models.
- Score: 52.68554895844062
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is a large ongoing scientific effort in mechanistic interpretability to map embeddings and internal representations of AI systems into human-understandable concepts. A key element of this effort is the linear representation hypothesis, which posits that neural representations are sparse linear combinations of `almost-orthogonal' direction vectors, reflecting the presence or absence of different features. This model underpins the use of sparse autoencoders to recover features from representations. Moving towards a fuller model of features, in which neural representations could encode not just the presence but also a potentially continuous and multidimensional value for a feature, has been a subject of intense recent discourse. We describe why and how a feature might be represented as a manifold, demonstrating in particular that cosine similarity in representation space may encode the intrinsic geometry of a feature through shortest, on-manifold paths, potentially answering the question of how distance in representation space and relatedness in concept space could be connected. The critical assumptions and predictions of the theory are validated on text embeddings and token activations of large language models.
Related papers
- Emergence of Quantised Representations Isolated to Anisotropic Functions [0.0]
This paper builds upon the existing Spotlight Resonance method to determine representational alignment.<n>A new tool is used to gain insight into how discrete representations can emerge and organise in autoencoder models.
arXiv Detail & Related papers (2025-07-16T09:27:54Z) - From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit [16.996218963146788]
We show that MP-SAE unrolls its encoder into a sequence of residual-guided steps, allowing it to capture hierarchical and nonlinearly accessible features.<n>We also show that the sequential encoder principle of MP-SAE affords an additional benefit of adaptive sparsity at inference time.
arXiv Detail & Related papers (2025-06-03T17:24:55Z) - FeatInv: Spatially resolved mapping from feature space to input space using conditional diffusion models [0.9503773054285559]
Internal representations are crucial for understanding deep neural networks.<n>While mapping from feature space to input space aids in interpreting the former, existing approaches often rely on crude approximations.<n>We propose using a conditional diffusion model to learn such a mapping in a probabilistic manner.
arXiv Detail & Related papers (2025-05-27T11:07:34Z) - I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? [76.15163242945813]
Large language models (LLMs) have led many to conclude that they exhibit a form of intelligence.<n>We introduce a novel generative model that generates tokens on the basis of human-interpretable concepts represented as latent discrete variables.
arXiv Detail & Related papers (2025-03-12T01:21:17Z) - Learning Visual-Semantic Subspace Representations [49.17165360280794]
We introduce a nuclear norm-based loss function, grounded in the same information theoretic principles that have proved effective in self-supervised learning.<n>We present a theoretical characterization of this loss, demonstrating that, in addition to promoting classity, it encodes the spectral geometry of the data within a subspace lattice.
arXiv Detail & Related papers (2024-05-25T12:51:38Z) - On the Origins of Linear Representations in Large Language Models [51.88404605700344]
We introduce a simple latent variable model to formalize the concept dynamics of the next token prediction.
Experiments show that linear representations emerge when learning from data matching the latent variable model.
We additionally confirm some predictions of the theory using the LLaMA-2 large language model.
arXiv Detail & Related papers (2024-03-06T17:17:36Z) - Bridging Neural and Symbolic Representations with Transitional Dictionary Learning [4.326886488307076]
This paper introduces a novel Transitional Dictionary Learning (TDL) framework that can implicitly learn symbolic knowledge.<n>We propose a game-theoretic diffusion model to decompose the input into visual parts using the dictionaries learned by the Expectation Maximization (EM) algorithm.<n> Experiments are conducted on three abstract compositional visual object datasets.
arXiv Detail & Related papers (2023-08-03T19:29:35Z) - A Recursive Bateson-Inspired Model for the Generation of Semantic Formal
Concepts from Spatial Sensory Data [77.34726150561087]
This paper presents a new symbolic-only method for the generation of hierarchical concept structures from complex sensory data.
The approach is based on Bateson's notion of difference as the key to the genesis of an idea or a concept.
The model is able to produce fairly rich yet human-readable conceptual representations without training.
arXiv Detail & Related papers (2023-07-16T15:59:13Z) - Emergence of Machine Language: Towards Symbolic Intelligence with Neural
Networks [73.94290462239061]
We propose to combine symbolism and connectionism principles by using neural networks to derive a discrete representation.
By designing an interactive environment and task, we demonstrated that machines could generate a spontaneous, flexible, and semantic language.
arXiv Detail & Related papers (2022-01-14T14:54:58Z) - Word2Box: Learning Word Representation Using Box Embeddings [28.080105878687185]
Learning vector representations for words is one of the most fundamental topics in NLP.
Our model, Word2Box, takes a region-based approach to the problem of word representation, representing words as $n$-dimensional rectangles.
We demonstrate improved performance on various word similarity tasks, particularly on less common words.
arXiv Detail & Related papers (2021-06-28T01:17:11Z) - The Low-Dimensional Linear Geometry of Contextualized Word
Representations [27.50785941238007]
We study the linear geometry of contextualized word representations in ELMO and BERT.
We show that a variety of linguistic features are encoded in low-dimensional subspaces.
arXiv Detail & Related papers (2021-05-15T00:58:08Z) - High-dimensional distributed semantic spaces for utterances [0.2907403645801429]
This paper describes a model for high-dimensional representation for utterance and text level data.
It is based on a mathematically principled and behaviourally plausible approach to representing linguistic information.
The paper shows how the implemented model is able to represent a broad range of linguistic features in a common integral framework of fixed dimensionality.
arXiv Detail & Related papers (2021-04-01T12:09:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.