Related papers: Bridging the Dimensional Chasm: Uncover Layer-wise Dimensional Reduction in Transformers through Token Correlation

Bridging the Dimensional Chasm: Uncover Layer-wise Dimensional Reduction in Transformers through Token Correlation

URL: http://arxiv.org/abs/2503.22547v1
Date: Fri, 28 Mar 2025 15:47:30 GMT
Title: Bridging the Dimensional Chasm: Uncover Layer-wise Dimensional Reduction in Transformers through Token Correlation
Authors: Zhuo-Yang Song, Zeyu Li, Qing-Hong Cao, Ming-xing Luo, Hua Xing Zhu,
Abstract summary: We develop a framework that tracks token dynamics across Transformers layers.<n>This work advances interpretability by reframing Transformers layers as projectors between high-dimensional and low-dimensional semantics.
Score: 2.5976894391099625
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The geometric evolution of token representations in large language models (LLMs) presents a fundamental paradox: while human language inherently organizes semantic information in low-dimensional spaces ($\sim 10^1$ dimensions), modern LLMs employ high-dimensional embeddings ($\sim 10^3$ dimensions) processed through Transformer architectures. To resolve this paradox, this work bridges this conceptual gap by developing a geometric framework that tracks token dynamics across Transformers layers. Through layer-wise analysis of intrinsic dimensions across multiple architectures, we reveal an expansion-contraction pattern where tokens diffuse to a "working space" and then progressively project onto lower-dimensional submanifolds. Our finding implies a negative correlation between the working space dimension and parameter-sensitive performance of the LLMs, and indicates that effective models tend to compress tokens into approximately 10-dimensional submanifolds, closely resembling human semantic spaces. This work not only advances LLM interpretability by reframing Transformers layers as projectors that mediate between high-dimensional computation and low-dimensional semantics, but also provides practical tools for model diagnostics that do not rely on task-specific evaluations.

Related papers

TangramPuzzle: Evaluating Multimodal Large Language Models with Compositional Spatial Reasoning [104.66714520975837]
We introduce a geometry-grounded benchmark designed to evaluate compositional spatial reasoning through the lens of the classic Tangram game.<n>We propose the Tangram Construction Expression (TCE), a symbolic geometric framework that grounds tangram assemblies in exact, machine-verifiable coordinate specifications.<n>We conduct extensive evaluation experiments on advanced open-source and proprietary models, revealing an interesting insight: MLLMs tend to prioritize matching the target silhouette while neglecting geometric constraints.
arXiv Detail & Related papers (2026-01-23T07:35:05Z)
GeoMotionGPT: Geometry-Aligned Motion Understanding with Large Language Models [23.159388800893964]
We argue that alignment is most effective when both modalities share a unified geometric basis.<n>We employ a decoder-only quantizer with Gumbel-Softmax for differentiable training and balanced codebook usage.<n>Our framework achieves a 20% performance improvement over current state-of-the-art methods.
arXiv Detail & Related papers (2026-01-12T15:14:29Z)
Visualizing LLM Latent Space Geometry Through Dimensionality Reduction [0.0]
We extract, process, and visualize latent state geometries in Transformer-based language models through dimensionality reduction.<n>We demonstrate experiments on GPT-2 and LLaMa models, where we uncover interesting geometric patterns in latent space.
arXiv Detail & Related papers (2025-11-26T17:11:39Z)
Geometry of Decision Making in Language Models [19.74354232642455]
Large Language Models (LLMs) show strong generalization across diverse tasks, yet the internal decision-making processes behind their predictions remain opaque.<n>We study the geometry of hidden representations in LLMs through the lens of textitintrinsic dimension (ID)<n>We perform a large-scale study, with 28 open-weight transformer models and estimate ID across layers using multiple estimators.
arXiv Detail & Related papers (2025-11-25T13:52:46Z)
DiffuMeta: Algebraic Language Models for Inverse Design of Metamaterials via Diffusion Transformers [0.6531893282486697]
We present DiffuMeta, a generative framework integrating diffusion transformers with a novel algebraic language representation, encoding 3D geometries as mathematical sentences.<n>This compact, unified parameterization spans diverse topologies while enabling direct application of transformers to structural design.<n>Our approach enables simultaneous control over multiple mechanical objectives, including linear and nonlinear responses beyond training domains.
arXiv Detail & Related papers (2025-07-21T16:09:26Z)
Large Language Models Encode Semantics in Low-Dimensional Linear Subspaces [31.401762286885656]
Understanding the space geometry of large language models (LLMs) is key to interpreting their behavior and improving alignment.<n>baturayWe investigate what extent LLMs internally organize related to semantic understanding.
arXiv Detail & Related papers (2025-07-13T17:03:25Z)
Semantic Wave Functions: Exploring Meaning in Large Language Models Through Quantum Formalism [0.0]
Large Language Models (LLMs) encode semantic relationships in high-dimensional vector embeddings.<n>This paper explores the analogy between LLM embedding spaces and quantum mechanics.<n>We introduce a "semantic wave function" to formalize this quantum-derived representation.
arXiv Detail & Related papers (2025-03-09T08:23:31Z)
Riemann$^2$: Learning Riemannian Submanifolds from Riemannian Data [12.424539896723603]
Latent variable models are powerful tools for learning low-dimensional manifold from high-dimensional data. This paper generalizes previous work and allows us to handle complex tasks in various domains, including robot motion synthesis and analysis of brain connectomes.
arXiv Detail & Related papers (2025-03-07T16:08:53Z)
Demystifying Singular Defects in Large Language Models [61.98878352956125]
In large language models (LLMs), the underlying causes of high-norm tokens remain largely unexplored.<n>We provide both theoretical insights and empirical validation across a range of recent models.<n>We showcase two practical applications of these findings: the improvement of quantization schemes and the design of LLM signatures.
arXiv Detail & Related papers (2025-02-10T20:09:16Z)
Neural Isometries: Taming Transformations for Equivariant ML [8.203292895010748]
We introduce Neural Isometries, an autoencoder framework which learns to map the observation space to a general-purpose latent space. We show that a simple off-the-shelf equivariant network operating in the pre-trained latent space can achieve results on par with meticulously-engineered, handcrafted networks.
arXiv Detail & Related papers (2024-05-29T17:24:25Z)
Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs) We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z)
GTA: A Geometry-Aware Attention Mechanism for Multi-View Transformers [63.41460219156508]
We argue that existing positional encoding schemes are suboptimal for 3D vision tasks. We propose a geometry-aware attention mechanism that encodes the geometric structure of tokens as relative transformation. We show that our attention, called Geometric Transform Attention (GTA), improves learning efficiency and performance of state-of-the-art transformer-based NVS models.
arXiv Detail & Related papers (2023-10-16T13:16:09Z)
VTAE: Variational Transformer Autoencoder with Manifolds Learning [144.0546653941249]
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables. The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning. We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
arXiv Detail & Related papers (2023-04-03T13:13:19Z)
Solving High-Dimensional PDEs with Latent Spectral Models [74.1011309005488]
We present Latent Spectral Models (LSM) toward an efficient and precise solver for high-dimensional PDEs. Inspired by classical spectral methods in numerical analysis, we design a neural spectral block to solve PDEs in the latent space. LSM achieves consistent state-of-the-art and yields a relative gain of 11.5% averaged on seven benchmarks.
arXiv Detail & Related papers (2023-01-30T04:58:40Z)
Analyzing the Latent Space of GAN through Local Dimension Estimation [4.688163910878411]
style-based GANs (StyleGANs) in high-fidelity image synthesis have motivated research to understand the semantic properties of their latent spaces. We propose a local dimension estimation algorithm for arbitrary intermediate layers in a pre-trained GAN model. Our proposed metric, called Distortion, measures an inconsistency of intrinsic space on the learned latent space.
arXiv Detail & Related papers (2022-05-26T06:36:06Z)
Exploring Dimensionality Reduction Techniques in Multilingual Transformers [64.78260098263489]
This paper gives a comprehensive account of the impact of dimensional reduction techniques on the performance of state-of-the-art multilingual Siamese Transformers. It shows that it is possible to achieve an average reduction in the number of dimensions of $91.58% pm 2.59%$ and $54.65% pm 32.20%$, respectively.
arXiv Detail & Related papers (2022-04-18T17:20:55Z)
Path Development Network with Finite-dimensional Lie Group Representation [3.9983665898166425]
We propose a novel, trainable path development layer, which exploits representations of sequential data through finite-dimensional Lie groups. Our proposed layer, analogous to recurrent neural networks (RNN), possesses an explicit, simple recurrent unit that alleviates the gradient issues. Empirical results on a range of datasets show that the development layer consistently and significantly outperforms signature features on accuracy and dimensionality.
arXiv Detail & Related papers (2022-04-02T02:01:00Z)
The Geometry of Deep Generative Image Models and its Applications [0.0]
Generative adversarial networks (GANs) have emerged as a powerful unsupervised method to model the statistical patterns of real-world data sets. These networks are trained to map random inputs in their latent space to new samples representative of the learned data. The structure of the latent space is hard to intuit due to its high dimensionality and the non-linearity of the generator.
arXiv Detail & Related papers (2021-01-15T07:57:33Z)
Closed-Form Factorization of Latent Semantics in GANs [65.42778970898534]
A rich set of interpretable dimensions has been shown to emerge in the latent space of the Generative Adversarial Networks (GANs) trained for synthesizing images. In this work, we examine the internal representation learned by GANs to reveal the underlying variation factors in an unsupervised manner. We propose a closed-form factorization algorithm for latent semantic discovery by directly decomposing the pre-trained weights.
arXiv Detail & Related papers (2020-07-13T18:05:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.