Related papers: A Unified Geometric Field Theory Framework for Transformers: From Manifold Embeddings to Kernel Modulation

A Unified Geometric Field Theory Framework for Transformers: From Manifold Embeddings to Kernel Modulation

URL: http://arxiv.org/abs/2511.08243v2
Date: Thu, 13 Nov 2025 01:38:33 GMT
Title: A Unified Geometric Field Theory Framework for Transformers: From Manifold Embeddings to Kernel Modulation
Authors: Xianshuai Shi, Jianfeng Zhu, Leibo Liu,
Abstract summary: The Transformer architecture has achieved tremendous success in natural language processing, computer vision, and scientific computing through its self-attention mechanism.<n>This paper proposes a structural theoretical framework that integrates positional encoding, kernel integral operators, and attention mechanisms for in-depth theoretical investigation.
Score: 5.985222592888107
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The Transformer architecture has achieved tremendous success in natural language processing, computer vision, and scientific computing through its self-attention mechanism. However, its core components-positional encoding and attention mechanisms-have lacked a unified physical or mathematical interpretation. This paper proposes a structural theoretical framework that integrates positional encoding, kernel integral operators, and attention mechanisms for in-depth theoretical investigation. We map discrete positions (such as text token indices and image pixel coordinates) to spatial functions on continuous manifolds, enabling a field-theoretic interpretation of Transformer layers as kernel-modulated operators acting over embedded manifolds.

Related papers

Unlocking Out-of-Distribution Generalization in Transformers via Recursive Latent Space Reasoning [50.99796659680724]
This work investigates out-of-distribution (OOD) generalization in Transformer networks using a GSM8K-style modular arithmetic on computational graphs task as a testbed.<n>We introduce and explore a set of four architectural mechanisms aimed at enhancing OOD generalization.<n>We complement these empirical results with a detailed mechanistic interpretability analysis that reveals how these mechanisms give rise to robust OOD generalization abilities.
arXiv Detail & Related papers (2025-10-15T21:03:59Z)
A Mathematical Explanation of Transformers for Large Language Models and GPTs [6.245431127481903]
We propose a novel continuous framework that interprets the Transformer as a discretization of a structured integro-differential equation.<n>Within this formulation, the self-attention mechanism emerges naturally as a non-local integral operator.<n>Our approach extends beyond previous theoretical analyses by embedding the entire Transformer operation in continuous domains.
arXiv Detail & Related papers (2025-10-05T01:16:08Z)
A Free Probabilistic Framework for Analyzing the Transformer-based Language Models [19.78896931593813]
We present a formal operator-theoretic framework for analyzing Transformer-based language models using free probability theory.<n>This work offers a principled, though theoretical, perspective on structural dynamics in large language models.
arXiv Detail & Related papers (2025-06-19T19:13:02Z)
Theoretical Analysis of Positional Encodings in Transformer Models: Impact on Expressiveness and Generalization [10.034655199520168]
Positional encodings are a core part of transformer-based models.<n>This paper analyzes how various positional encoding methods impact a transformer's expressiveness, generalization ability, and extrapolation to longer sequences.
arXiv Detail & Related papers (2025-06-05T23:02:18Z)
Understanding Token-level Topological Structures in Transformer-based Time Series Forecasting [52.364260925700485]
Transformer-based methods have achieved state-of-the-art performance in time series forecasting (TSF)<n>It remains unclear whether existing Transformers fully leverage the intrinsic topological structure among tokens throughout intermediate layers.<n>We propose the Topology Enhancement Method (TEM), a novel Transformer-based TSF method that explicitly and adaptively preserves token-level topology.
arXiv Detail & Related papers (2024-04-16T07:21:39Z)
How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations [98.7450564309923]
This paper takes initial steps on understanding in-context learning (ICL) in more complex scenarios, by studying learning with representations. We construct synthetic in-context learning problems with a compositional structure, where the label depends on the input through a possibly complex but fixed representation function. We show theoretically the existence of transformers that approximately implement such algorithms with mild depth and size.
arXiv Detail & Related papers (2023-10-16T17:40:49Z)
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding [56.222097640468306]
We provide mechanistic understanding of how transformers learn "semantic structure" We show, through a combination of mathematical analysis and experiments on Wikipedia data, that the embedding layer and the self-attention layer encode the topical structure.
arXiv Detail & Related papers (2023-03-07T21:42:17Z)
Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability [30.76910454663951]
Causal abstraction provides a theoretical foundation for mechanistic interpretability.<n>Our contributions are generalizing the theory of causal abstraction from mechanism replacement to arbitrary mechanism transformation.
arXiv Detail & Related papers (2023-01-11T20:42:41Z)
Representational Systems Theory: A Unified Approach to Encoding, Analysing and Transforming Representations [3.1252164619375473]
Representational Systems Theory is designed to encode a wide variety of representations from three core perspectives. It becomes possible to structurally transform representations in one system into representations in another.
arXiv Detail & Related papers (2022-06-07T10:43:27Z)
Geometric Transformer for End-to-End Molecule Properties Prediction [92.28929858529679]
We introduce a Transformer-based architecture for molecule property prediction, which is able to capture the geometry of the molecule. We modify the classical positional encoder by an initial encoding of the molecule geometry, as well as a learned gated self-attention mechanism.
arXiv Detail & Related papers (2021-10-26T14:14:40Z)
Transformers with Competitive Ensembles of Independent Mechanisms [97.93090139318294]
We propose a new Transformer layer which divides the hidden representation and parameters into multiple mechanisms, which only exchange information through attention. We study TIM on a large-scale BERT model, on the Image Transformer, and on speech enhancement and find evidence for semantically meaningful specialization as well as improved performance.
arXiv Detail & Related papers (2021-02-27T21:48:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.