Related papers: Rethinking RoPE: A Mathematical Blueprint for N-dimensional Positional Encoding

Rethinking RoPE: A Mathematical Blueprint for N-dimensional Positional Encoding

URL: http://arxiv.org/abs/2504.06308v1
Date: Mon, 07 Apr 2025 21:58:22 GMT
Title: Rethinking RoPE: A Mathematical Blueprint for N-dimensional Positional Encoding
Authors: Haiping Liu, Hongpeng Zhou,
Abstract summary: We propose a systematic mathematical framework for RoPE grounded in Lie group and Lie algebra theory.<n>We identify two core properties of RoPE, named relativity and reversibility, and derive general constraints and constructions for valid RoPE in 1D, 2D, and N-dimensional.<n>Our framework unifies and explains existing RoPE designs, while enabling principled extensions to new modalities and tasks.
Score: 0.4604003661048266
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Rotary Position Embedding (RoPE) is widely adopted in Transformers due to its ability to encode relative positions with high efficiency and extrapolation capability. However, existing RoPE variants lack a unified theoretical foundation, especially in higher dimensions. In this paper, we propose a systematic mathematical framework for RoPE grounded in Lie group and Lie algebra theory. We identify two core properties of RoPE, named relativity and reversibility, and derive general constraints and constructions for valid RoPE in 1D, 2D, and N-dimensional (ND). We prove that RoPE must lie in the basis of a maximal abelian subalgebra (MASA) of the special orthogonal Lie algebra, and show that standard RoPE corresponds to the maximal toral subalgebra. Furthermore, we propose to model inter-dimensional interactions by learning an orthogonal basis transformation. Our framework unifies and explains existing RoPE designs, while enabling principled extensions to new modalities and tasks.

Related papers

ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle Matrices [25.99231204405503]
We propose ComRoPE, which generalizes Rotary Positional PE (RoPE) by defining it in terms of trainable commuting angle matrices.<n>We present two types of trainable commuting angle matrices as sufficient solutions to the RoPE equation.<n>Our framework shows versatility in generalizing to existing RoPE formulations and offering new insights for future positional encoding research.
arXiv Detail & Related papers (2025-06-04T09:10:02Z)
PAID: Pairwise Angular-Invariant Decomposition for Continual Test-Time Adaptation [70.98107766265636]
This paper takes the geometric attributes of pre-trained weights as a starting point, systematically analyzing three key components: magnitude, absolute angle, and pairwise angular structure.<n>We find that the pairwise angular structure remains stable across diverse corrupted domains and encodes domain-invariant semantic information, suggesting it should be preserved during adaptation.
arXiv Detail & Related papers (2025-06-03T05:18:15Z)
Revisiting LRP: Positional Attribution as the Missing Ingredient for Transformer Explainability [53.21677928601684]
Layer-wise relevance propagation is one of the most promising approaches to explainability in deep learning.<n>We propose specialized theoretically-grounded LRP rules designed to propagate attributions across various positional encoding methods.<n>Our method significantly outperforms the state-of-the-art in both vision and NLP explainability tasks.
arXiv Detail & Related papers (2025-06-02T18:07:55Z)
A Dual Basis Approach for Structured Robust Euclidean Distance Geometry [6.422262171968397]
This paper considers the setting where only a set of anchor nodes is used to collect the distances between themselves and the rest.<n>In the presence of potential outliers, it results in a structured partial observation on EDM with partial corruptions.<n>We propose a novel algorithmic framework, dubbed Robust Euclidean Distance Geometry via Dual Basis (RoDEoDB) for recovering the Euclidean distance geometry.
arXiv Detail & Related papers (2025-05-23T22:40:21Z)
PaTH Attention: Position Encoding via Accumulating Householder Transformations [56.32365080761523]
PaTH is a flexible data-dependent position encoding scheme based on accumulated products of Householder transformations.<n>We derive an efficient parallel algorithm for training through exploiting a compact representation of products of Householder matrices.
arXiv Detail & Related papers (2025-05-22T08:36:09Z)
Directional Non-Commutative Monoidal Structures for Compositional Embeddings in Machine Learning [0.0]
We introduce a new structure for compositional embeddings built on directional non-commutative monoidal operators.<n>Our construction defines a distinct composition operator circ_i for each axis i, ensuring associative combination along each axis without imposing global commutativity.<n>All axis-specific operators commute with one another, enforcing a global interchange law that enables consistent crossaxis compositions.
arXiv Detail & Related papers (2025-05-21T13:27:14Z)
Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations [50.010924231754856]
Adapting pre-trained foundation models for diverse downstream tasks is a core practice in artificial intelligence.<n>To overcome this, parameter-efficient fine-tuning (PEFT) methods like LoRA have emerged and are becoming a growing research focus.<n>We propose a generalization that extends matrix-based PEFT methods to higher-dimensional parameter spaces without compromising their structural properties.
arXiv Detail & Related papers (2025-04-01T14:36:45Z)
DRoPE: Directional Rotary Position Embedding for Efficient Agent Interaction Modeling [9.86959003425198]
Directional Rotary Position Embedding (DRoPE) is a novel adaptation of Rotary Position Embedding (RoPE) originally developed in natural language processing.<n>DRoPE overcomes limitations by introducing a uniform identity scalar into RoPE's 2D rotary transformation.<n> Empirical evaluations confirm DRoPE's good performance and significantly reduced space complexity.
arXiv Detail & Related papers (2025-03-19T09:23:09Z)
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization [23.936687072300053]
We show that Rotary Position Embedding (RoPE) enables periodic attention by implicitly achieving Non-Uniform Discrete Fourier Transform.<n>This periodicity is undermined by the spectral damage caused by: 1) linear layers and activation functions outside of attention; 2) insufficiently trained frequency components brought by time-domain truncation.<n>We propose Fourier Position Embedding (FoPE), which enhances attention's frequency-domain properties to improve both its periodic extension and length generalization.
arXiv Detail & Related papers (2024-12-23T17:44:01Z)
RMLR: Extending Multinomial Logistic Regression into General Geometries [64.16104856124029]
Our framework only requires minimal geometric properties, thus exhibiting broad applicability. We develop five families of SPD MLRs under five types of power-deformed metrics. On rotation matrices we propose Lie MLR based on the popular bi-invariant metric.
arXiv Detail & Related papers (2024-09-28T18:38:21Z)
Base of RoPE Bounds Context Length [37.11078116104313]
Rotary position embedding (RoPE) is a technique that encodes the position information with a rotation matrix. In this paper, we find that LLMs may obtain a superficial long-context ability based on the OOD theory. Our work reveals the relationship between context length and RoPE base both theoretically and empirically, which may shed light on future long context training.
arXiv Detail & Related papers (2024-05-23T14:03:31Z)
Resonance RoPE: Improving Context Length Generalization of Large Language Models [37.749813693281254]
This paper addresses the challenge of train-short-test-long (TSTL) scenarios in Large Language Models (LLMs) equipped with Rotary Position Embedding (RoPE) We introduce Resonance RoPE, a novel approach designed to narrow the generalization gap in TSTL scenarios. We present PosGen, a new synthetic benchmark specifically designed for fine-grained behavior analysis in TSTL scenarios.
arXiv Detail & Related papers (2024-02-29T19:02:03Z)
Scaling Laws of RoPE-based Extrapolation [103.33995311915864]
We propose textbftextitScaling Laws of RoPE-based Extrapolation to describe the relationship between the extrapolation performance and base value. We achieve extrapolation up to 1 million context length within only 16K training length on LLaMA2 7B and 13B.
arXiv Detail & Related papers (2023-10-08T15:50:36Z)
Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers [71.32827362323205]
We propose a new class of linear Transformers calledLearner-Transformers (Learners) They incorporate a wide range of relative positional encoding mechanisms (RPEs) These include regular RPE techniques applied for sequential data, as well as novel RPEs operating on geometric data embedded in higher-dimensional Euclidean spaces.
arXiv Detail & Related papers (2023-02-03T18:57:17Z)
Your Transformer May Not be as Powerful as You Expect [88.11364619182773]
We mathematically analyze the power of RPE-based Transformers regarding whether the model is capable of approximating any continuous sequence-to-sequence functions. We present a negative result by showing there exist continuous sequence-to-sequence functions that RPE-based Transformers cannot approximate no matter how deep and wide the neural network is. We develop a novel attention module, called Universal RPE-based (URPE) Attention, which satisfies the conditions.
arXiv Detail & Related papers (2022-05-26T14:51:30Z)
A Unifying and Canonical Description of Measure-Preserving Diffusions [60.59592461429012]
A complete recipe of measure-preserving diffusions in Euclidean space was recently derived unifying several MCMC algorithms into a single framework. We develop a geometric theory that improves and generalises this construction to any manifold.
arXiv Detail & Related papers (2021-05-06T17:36:55Z)
RoFormer: Enhanced Transformer with Rotary Position Embedding [9.01819510933327]
We propose a novel method named Rotary Position Embedding(RoPE) to effectively leverage the positional information. RoPE encodes the absolute position with a rotation matrix and meanwhile incorporates the explicit relative position dependency in self-attention formulation. We evaluate the enhanced transformer with rotary position embedding, also called RoFormer, on various long text classification benchmark datasets.
arXiv Detail & Related papers (2021-04-20T09:54:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.