Related papers: TokenBlowUp: Resolving Representational Singularities in LLM Token Spaces via Monoidal Transformations

TokenBlowUp: Resolving Representational Singularities in LLM Token Spaces via Monoidal Transformations

URL: http://arxiv.org/abs/2507.19747v2
Date: Wed, 30 Jul 2025 23:48:07 GMT
Title: TokenBlowUp: Resolving Representational Singularities in LLM Token Spaces via Monoidal Transformations
Authors: Dongfang Zhao,
Abstract summary: Recent work has provided compelling evidence challenging the foundational manifold hypothesis for the token embedding spaces of Large Language Models.<n>We formalize this problem in the language of scheme theory and propose a rigorous resolution by applying the scheme-theoretic blow-up at each singular point.<n>We prove a formal theorem guaranteeing the geometric regularization of this new space, showing that the original pathologies are resolved.
Score: 1.3824176915623292
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Recent work has provided compelling evidence challenging the foundational manifold hypothesis for the token embedding spaces of Large Language Models (LLMs). These findings reveal the presence of geometric singularities around polysemous tokens, which can lead to representational instability. Existing methodologies, which presuppose a smooth data manifold, are ill-equipped to address such intrinsic structural flaws. In this paper, we formalize this problem in the language of scheme theory and propose a rigorous resolution by applying the scheme-theoretic blow-up at each singular point. This procedure replaces a singular point in the ambient affine scheme with its exceptional divisor, which we identify as a canonical geometric space -- a projective space of directions -- that houses the disambiguated semantic meanings of the token. This process of ``representational desingularization'' constructs a new geometric landscape for embeddings. We prove a formal theorem guaranteeing the geometric regularization of this new space, showing that the original pathologies are resolved. Finally, we outline the architectural implications of our framework, arguing for a paradigm shift from static look-ups to dynamic, geometrically-grounded computation.

Related papers

Position: Beyond Euclidean -- Foundation Models Should Embrace Non-Euclidean Geometries [42.83280708842304]
Euclidean space has been the de facto geometric setting for machine learning architectures.<n>At a large scale, real-world data often exhibit inherently non-Euclidean structures, such as multi-way relationships, hierarchies, symmetries, and non-isotropic scaling.<n>This paper argues that moving beyond Euclidean geometry is not merely an optional enhancement but a necessity to maintain the scaling law for the next-generation of foundation models.
arXiv Detail & Related papers (2025-04-11T18:07:33Z)
Relative Representations: Topological and Geometric Perspectives [53.88896255693922]
Relative representations are an established approach to zero-shot model stitching.<n>We introduce a normalization procedure in the relative transformation, resulting in invariance to non-isotropic rescalings and permutations.<n>Second, we propose to deploy topological densification when fine-tuning relative representations, a topological regularization loss encouraging clustering within classes.
arXiv Detail & Related papers (2024-09-17T08:09:22Z)
Decoder ensembling for learned latent geometries [15.484595752241122]
We show how to easily compute geodesics on the associated expected manifold. We find this simple and reliable, thereby coming one step closer to easy-to-use latent geometries.
arXiv Detail & Related papers (2024-08-14T12:35:41Z)
Learning Visual-Semantic Subspace Representations [49.17165360280794]
We introduce a nuclear norm-based loss function, grounded in the same information theoretic principles that have proved effective in self-supervised learning.<n>We present a theoretical characterization of this loss, demonstrating that, in addition to promoting classity, it encodes the spectral geometry of the data within a subspace lattice.
arXiv Detail & Related papers (2024-05-25T12:51:38Z)
Topological Obstructions and How to Avoid Them [22.45861345237023]
We show that local optima can arise due to singularities or an incorrect degree or winding number. We propose a new flow-based model that maps data points to multimodal distributions over geometric spaces.
arXiv Detail & Related papers (2023-12-12T18:56:14Z)
Basis restricted elastic shape analysis on the space of unregistered surfaces [10.543359560247847]
This paper introduces a new mathematical and numerical framework for surface analysis. The specificity of the approach we develop is to restrict the space of allowable transformations to predefined finite dimensional bases of deformation fields. We specifically validate our approach on human body shape and pose data as well as human face scans, and show how it generally outperforms state-of-the-art methods on problems such as shape registration, motion transfer or random pose generation.
arXiv Detail & Related papers (2023-11-07T23:06:22Z)
Geometric Neural Diffusion Processes [55.891428654434634]
We extend the framework of diffusion models to incorporate a series of geometric priors in infinite-dimension modelling. We show that with these conditions, the generative functional model admits the same symmetry.
arXiv Detail & Related papers (2023-07-11T16:51:38Z)
Geometric Scattering on Measure Spaces [15.819230791757906]
We introduce a general, unified model for geometric scattering on measure spaces.<n>We consider finite measure spaces that are obtained from randomly sampling an unknown manifold.<n>We propose two methods for constructing a data-driven graph on which the associated graph scattering transform approximates the scattering transform on the underlying manifold.
arXiv Detail & Related papers (2022-08-17T22:40:09Z)
Geometric Methods for Sampling, Optimisation, Inference and Adaptive Agents [102.42623636238399]
We identify fundamental geometric structures that underlie the problems of sampling, optimisation, inference and adaptive decision-making. We derive algorithms that exploit these geometric structures to solve these problems efficiently.
arXiv Detail & Related papers (2022-03-20T16:23:17Z)
A singular Riemannian geometry approach to Deep Neural Networks I. Theoretical foundations [77.86290991564829]
Deep Neural Networks are widely used for solving complex problems in several scientific areas, such as speech recognition, machine translation, image analysis. We study a particular sequence of maps between manifold, with the last manifold of the sequence equipped with a Riemannian metric. We investigate the theoretical properties of the maps of such sequence, eventually we focus on the case of maps between implementing neural networks of practical interest.
arXiv Detail & Related papers (2021-12-17T11:43:30Z)
A Unifying and Canonical Description of Measure-Preserving Diffusions [60.59592461429012]
A complete recipe of measure-preserving diffusions in Euclidean space was recently derived unifying several MCMC algorithms into a single framework. We develop a geometric theory that improves and generalises this construction to any manifold.
arXiv Detail & Related papers (2021-05-06T17:36:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.