The structure of the token space for large language models
- URL: http://arxiv.org/abs/2410.08993v1
- Date: Fri, 11 Oct 2024 17:07:15 GMT
- Title: The structure of the token space for large language models
- Authors: Michael Robinson, Sourya Dey, Shauna Sweet,
- Abstract summary: Large language models encode the correlational structure present in natural language by fitting segments of utterances (tokens) into a high dimensional ambient latent space upon which the models then operate.
We present estimators for the dimension and Ricci scalar curvature of the token subspace, and apply it to three open source large language models of moderate size.
We find that the dimension and curvature correlate with generative fluency of the models, which suggest that these findings have implications for model behavior.
- Score: 1.5621144215664768
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models encode the correlational structure present in natural language by fitting segments of utterances (tokens) into a high dimensional ambient latent space upon which the models then operate. We assert that in order to develop a foundational, first-principles understanding of the behavior and limitations of large language models, it is crucial to understand the topological and geometric structure of this token subspace. In this article, we present estimators for the dimension and Ricci scalar curvature of the token subspace, and apply it to three open source large language models of moderate size: GPT2, LLEMMA7B, and MISTRAL7B. In all three models, using these measurements, we find that the token subspace is not a manifold, but is instead a stratified manifold, where on each of the individual strata, the Ricci curvature is significantly negative. We additionally find that the dimension and curvature correlate with generative fluency of the models, which suggest that these findings have implications for model behavior.
Related papers
- Shared Global and Local Geometry of Language Model Embeddings [46.33317507982751]
We find that token embeddings of language models exhibit common geometric structure.
We show that tokens with lower intrinsic dimensions often have semantically coherent clusters, while those with higher intrinsic dimensions do not.
Perhaps most surprisingly, we find that alignment in token embeddings persists through the hidden states of language models.
arXiv Detail & Related papers (2025-03-27T01:17:06Z) - Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding [58.38294408121273]
We propose Cross-modal and Uncertainty-aware Agglomeration for Open-vocabulary 3D Scene Understanding dubbed CUA-O3D.
Our method addresses two key challenges: (1) incorporating semantic priors from VLMs alongside the geometric knowledge of spatially-aware vision foundation models, and (2) using a novel deterministic uncertainty estimation to capture model-specific uncertainties.
arXiv Detail & Related papers (2025-03-20T20:58:48Z) - Scaling Laws for Linear Complexity Language Models [18.787664489713332]
We present the scaling laws for linear complexity language models to establish a foundation for their scalability.
The study reveals that existing linear complexity language models exhibit similar scaling capabilities as conventional transformer-based models.
arXiv Detail & Related papers (2024-06-24T14:51:31Z) - Hidden Holes: topological aspects of language models [1.1172147007388977]
We study the evolution of topological structure in GPT based large language models across depth and time during training.
We show that the latter exhibit more topological complexity, with a distinct pattern of changes common to all natural languages but absent from synthetically generated data.
arXiv Detail & Related papers (2024-06-09T14:25:09Z) - On the Origins of Linear Representations in Large Language Models [51.88404605700344]
We introduce a simple latent variable model to formalize the concept dynamics of the next token prediction.
Experiments show that linear representations emerge when learning from data matching the latent variable model.
We additionally confirm some predictions of the theory using the LLaMA-2 large language model.
arXiv Detail & Related papers (2024-03-06T17:17:36Z) - Geometric Neural Diffusion Processes [55.891428654434634]
We extend the framework of diffusion models to incorporate a series of geometric priors in infinite-dimension modelling.
We show that with these conditions, the generative functional model admits the same symmetry.
arXiv Detail & Related papers (2023-07-11T16:51:38Z) - Topological Parallax: A Geometric Specification for Deep Perception
Models [0.778001492222129]
We introduce topological parallax as a theoretical and computational tool that compares a trained model to a reference dataset.
Our examples show that this geometric similarity between dataset and model is essential to trustworthy and perturbation.
This new concept will add value to the current debate regarding the unclear relationship between overfitting and generalization in applications of deep-learning.
arXiv Detail & Related papers (2023-06-20T18:45:24Z) - Towards a mathematical understanding of learning from few examples with
nonlinear feature maps [68.8204255655161]
We consider the problem of data classification where the training set consists of just a few data points.
We reveal key relationships between the geometry of an AI model's feature space, the structure of the underlying data distributions, and the model's generalisation capabilities.
arXiv Detail & Related papers (2022-11-07T14:52:58Z) - Model Criticism for Long-Form Text Generation [113.13900836015122]
We apply a statistical tool, model criticism in latent space, to evaluate the high-level structure of generated text.
We perform experiments on three representative aspects of high-level discourse -- coherence, coreference, and topicality.
We find that transformer-based language models are able to capture topical structures but have a harder time maintaining structural coherence or modeling coreference.
arXiv Detail & Related papers (2022-10-16T04:35:58Z) - Unveiling the Latent Space Geometry of Push-Forward Generative Models [24.025975236316846]
Many deep generative models are defined as a push-forward of a Gaussian measure by a continuous generator, such as Generative Adversarial Networks (GANs) or Variational Auto-Encoders (VAEs)
This work explores the latent space of such deep generative models.
A key issue with these models is their tendency to output samples outside of the support of the target distribution when learning disconnected distributions.
arXiv Detail & Related papers (2022-07-21T15:29:35Z) - Geometry Interaction Knowledge Graph Embeddings [153.69745042757066]
We propose Geometry Interaction knowledge graph Embeddings (GIE), which learns spatial structures interactively between the Euclidean, hyperbolic and hyperspherical spaces.
Our proposed GIE can capture a richer set of relational information, model key inference patterns, and enable expressive semantic matching across entities.
arXiv Detail & Related papers (2022-06-24T08:33:43Z) - Contrastive Neighborhood Alignment [81.65103777329874]
We present Contrastive Neighborhood Alignment (CNA), a manifold learning approach to maintain the topology of learned features.
The target model aims to mimic the local structure of the source representation space using a contrastive loss.
CNA is illustrated in three scenarios: manifold learning, where the model maintains the local topology of the original data in a dimension-reduced space; model distillation, where a small student model is trained to mimic a larger teacher; and legacy model update, where an older model is replaced by a more powerful one.
arXiv Detail & Related papers (2022-01-06T04:58:31Z) - Atlas Generative Models and Geodesic Interpolation [0.20305676256390928]
We define the general class of Atlas Generative Models (AGMs), models with hybrid discrete-continuous latent space.
We exemplify this by generalizing an algorithm for graph based geodesic to the setting of AGMs, and verify its performance experimentally.
arXiv Detail & Related papers (2021-01-30T16:35:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.