The Curved Spacetime of Transformer Architectures
- URL: http://arxiv.org/abs/2511.03060v1
- Date: Tue, 04 Nov 2025 22:58:40 GMT
- Title: The Curved Spacetime of Transformer Architectures
- Authors: Riccardo Di Sipio, Jairo Diaz-Rodriguez, Luis Serrano,
- Abstract summary: We present a geometric framework for understanding Transformer-based language models, drawing an explicit analogy to General Relativity.<n>We show that token embeddings should not traverse straight paths in feature space; instead, their layer-wise steps should bend and reorient as interactions mediated by embedding space curvature.
- Score: 0.3670422696827525
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a geometric framework for understanding Transformer-based language models, drawing an explicit analogy to General Relativity. Queries and keys induce an effective metric on representation space, and attention acts as a discrete connection that implements parallel transport of value vectors across tokens. Stacked layers provide discrete time-slices through which token representations evolve on this curved manifold, while backpropagation plays the role of a least-action principle that shapes loss-minimizing trajectories in parameter space. If this analogy is correct, token embeddings should not traverse straight paths in feature space; instead, their layer-wise steps should bend and reorient as interactions mediated by embedding space curvature. To test this prediction, we design experiments that expose both the presence and the consequences of curvature: (i) we visualize a curvature landscape for a full paragraph, revealing how local turning angles vary across tokens and layers; (ii) we show through simulations that excess counts of sharp/flat angles and longer length-to-chord ratios are not explainable by dimensionality or chance; and (iii) inspired by Einstein's eclipse experiment, we probe deflection under controlled context edits, demonstrating measurable, meaning-consistent bends in embedding trajectories that confirm attention-induced curvature.
Related papers
- Brep2Shape: Boundary and Shape Representation Alignment via Self-Supervised Transformers [46.87466345672103]
Boundary representation (B-rep) is the industry standard for computer-aided design (CAD)<n>While deep learning shows promise in processing B-rep models, existing methods suffer from a representation gap.<n>We introduce Brep2Shape, a novel self-supervised pre-training method designed to align abstract boundary representations with intuitive shape representations.
arXiv Detail & Related papers (2026-02-07T08:00:47Z) - Fubini Study geometry of representation drift in high dimensional data [0.0]
High dimensional representation drift is commonly quantified using Euclidean or cosine distances.<n>We introduce a projective geometric view of representation drift grounded in the Fubini Study metric.<n>We show that the Fubini Study metric isolates intrinsic evolution by remaining invariant under gauge-induced fluctuations.
arXiv Detail & Related papers (2026-02-01T16:00:59Z) - Gauge-invariant representation holonomy [1.078600700827543]
Deep networks learn internal representations whose geometry--how features bend, rotate, and evolve--affects both generalization and robustness.<n>Existing similarity measures such as CKA or SVCCA capture pointwise overlap between activation sets, but miss how representations change along input paths.<n>We introduce representation holonomy, a gauge-invariant statistic that measures this path dependence.
arXiv Detail & Related papers (2026-01-29T12:51:17Z) - Understanding Transformers for Time Series: Rank Structure, Flow-of-ranks, and Compressibility [90.894232610821]
We analyze Transformers through the lens of rank structure.<n>We show that time-series embeddings exhibit sharply decaying singular value spectra.<n>We prove that the associated $Q/K/V$ projections admit accurate low-rank approximations.
arXiv Detail & Related papers (2025-10-02T23:56:17Z) - Curved Inference: Concern-Sensitive Geometry in Large Language Model Residual Streams [0.0]
We propose a geometric Interpretability framework that tracks how the residual stream trajectory of a large language model bends in response to shifts in semantic concern.<n>We analyse Gemma3-1b and LLaMA3.2-3b using five native-space metrics, with a primary focus on curvature (kappa_i) and salience (S(t))<n>We find that concern-shifted prompts reliably alter internal activation trajectories in both models.
arXiv Detail & Related papers (2025-07-08T23:05:00Z) - Neural Isometries: Taming Transformations for Equivariant ML [8.203292895010748]
We introduce Neural Isometries, an autoencoder framework which learns to map the observation space to a general-purpose latent space.
We show that a simple off-the-shelf equivariant network operating in the pre-trained latent space can achieve results on par with meticulously-engineered, handcrafted networks.
arXiv Detail & Related papers (2024-05-29T17:24:25Z) - Understanding and Mitigating Hyperbolic Dimensional Collapse in Graph Contrastive Learning [70.0681902472251]
We propose a novel contrastive learning framework to learn high-quality graph embeddings in hyperbolic space.<n>Specifically, we design the alignment metric that effectively captures the hierarchical data-invariant information.<n>We show that in the hyperbolic space one has to address the leaf- and height-level uniformity related to properties of trees.
arXiv Detail & Related papers (2023-10-27T15:31:42Z) - Curve Your Attention: Mixed-Curvature Transformers for Graph
Representation Learning [77.1421343649344]
We propose a generalization of Transformers towards operating entirely on the product of constant curvature spaces.
We also provide a kernelized approach to non-Euclidean attention, which enables our model to run in time and memory cost linear to the number of nodes and edges.
arXiv Detail & Related papers (2023-09-08T02:44:37Z) - Shape And Structure Preserving Differential Privacy [70.08490462870144]
We show how the gradient of the squared distance function offers better control over sensitivity than the Laplace mechanism.
We also show how using the gradient of the squared distance function offers better control over sensitivity than the Laplace mechanism.
arXiv Detail & Related papers (2022-09-21T18:14:38Z) - Visualizing high-dimensional loss landscapes with Hessian directions [0.0]
We study how curvature properties in lower-dimensional loss representations depend on those in the original loss space.
saddle points in the original space are rarely correctly identified as such in expected lower-dimensional representations if random projections are used.
arXiv Detail & Related papers (2022-08-28T13:18:47Z) - Rethinking the Zigzag Flattening for Image Reading [48.976491898131265]
We investigate the Hilbert fractal flattening (HF) as another method for sequence ordering in computer vision.
The HF has proven to be superior to other curves in maintaining spatial locality.
It can be easily plugged into most deep neural networks (DNNs)
arXiv Detail & Related papers (2022-02-21T13:53:04Z) - Orthogonal Jacobian Regularization for Unsupervised Disentanglement in
Image Generation [64.92152574895111]
We propose a simple Orthogonal Jacobian Regularization (OroJaR) to encourage deep generative model to learn disentangled representations.
Our method is effective in disentangled and controllable image generation, and performs favorably against the state-of-the-art methods.
arXiv Detail & Related papers (2021-08-17T15:01:46Z) - From Planes to Corners: Multi-Purpose Primitive Detection in Unorganized
3D Point Clouds [59.98665358527686]
We propose a new method for segmentation-free joint estimation of orthogonal planes.
Such unified scene exploration allows for multitudes of applications such as semantic plane detection or local and global scan alignment.
Our experiments demonstrate the validity of our approach in numerous scenarios from wall detection to 6D tracking.
arXiv Detail & Related papers (2020-01-21T06:51:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.