Context-aware Rotary Position Embedding
- URL: http://arxiv.org/abs/2507.23083v1
- Date: Wed, 30 Jul 2025 20:32:19 GMT
- Title: Context-aware Rotary Position Embedding
- Authors: Ali Veisi, Delaram Fartoot, Hamidreza Amirzadeh,
- Abstract summary: Rotary Positional Embeddings (RoPE) have become a widely adopted solution due to their compatibility with relative position encoding and computational efficiency.<n>We propose CARoPE (Context-Aware Rotary Positional Embedding), a novel generalization of RoPE that dynamically generates head-specific frequency patterns conditioned on token embeddings.<n>CaroPE consistently outperforms RoPE and other common positional encoding baselines, achieving significantly lower perplexity, even at longer context lengths.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Positional encoding is a vital component of Transformer architectures, enabling models to incorporate sequence order into self-attention mechanisms. Rotary Positional Embeddings (RoPE) have become a widely adopted solution due to their compatibility with relative position encoding and computational efficiency. However, RoPE relies on static, input-independent sinusoidal frequency patterns, limiting its ability to model context-sensitive relationships. In this work, we propose CARoPE (Context-Aware Rotary Positional Embedding), a novel generalization of RoPE that dynamically generates head-specific frequency patterns conditioned on token embeddings. This design introduces token- and context-sensitive positional representations while preserving RoPE efficiency and architectural simplicity. CARoPE computes input-dependent phase shifts using a bounded transformation of token embeddings and integrates them into the rotary mechanism across attention heads. We evaluate CARoPE on the FineWeb-Edu-10B dataset using GPT-2 variants trained on next-token prediction tasks. Experimental results show that CARoPE consistently outperforms RoPE and other common positional encoding baselines, achieving significantly lower perplexity, even at longer context lengths. Additionally, CARoPE enables faster training throughput without sacrificing model stability. These findings demonstrate that CARoPE offers a scalable, expressive, and efficient upgrade to existing positional encoding strategies in Transformer models.
Related papers
- Untwisting RoPE: Frequency Control for Shared Attention in DiTs [84.14005261938284]
Positional encodings are essential to transformer-based generative models.<n>We show that Rotary Positional Embeddings (RoPE) naturally decomposes into frequency components with distinct positional sensitivities.<n>We introduce a method for selectively modulating RoPE frequency bands so that attention reflects semantic similarity rather than strict positional alignment.
arXiv Detail & Related papers (2026-02-04T20:01:59Z) - DoPE: Denoising Rotary Position Embedding [60.779039511252584]
Rotary Position Embedding (RoPE) in Transformer models has inherent limits that weaken length.<n>We reinterpret the attention map with positional encoding as a noisy feature map, and propose Denoising Positional extrapolation page (DoPE)<n>DoPE is a training-free method based on truncated matrix entropy to detect outlier frequency bands in the feature map.
arXiv Detail & Related papers (2025-11-12T09:32:35Z) - Do traveling waves make good positional encodings? [44.55744608160896]
We propose RollPE, a novel positional encoding mechanism based on traveling waves.<n>We show it significantly outperforms traditional absolute positional embeddings.<n>We derive a mathematical equivalence of RollPE to a particular configuration of RoPE.
arXiv Detail & Related papers (2025-11-11T14:32:45Z) - EntroPE: Entropy-Guided Dynamic Patch Encoder for Time Series Forecasting [50.794700596484894]
We propose EntroPE (Entropy-Guided Dynamic Patch), a novel, temporally informed framework that dynamically detects transition points via conditional entropy.<n>This preserves temporal structure while retaining the computational benefits of patching.<n> Experiments across long-term forecasting benchmarks demonstrate that EntroPE improves both accuracy and efficiency.
arXiv Detail & Related papers (2025-09-30T12:09:56Z) - HoPE: Hyperbolic Rotary Positional Encoding for Stable Long-Range Dependency Modeling in Large Language Models [19.3827288035483]
We propose Hyperbolic Rotary Positional.<n>(HoPE) which leverages hyperbolic functions to implement Lorentz rotations on token representations.<n>Tests show HoPE consistently exceeds existing positional encoding methods.
arXiv Detail & Related papers (2025-09-05T16:20:48Z) - Decoupling the "What" and "Where" With Polar Coordinate Positional Embeddings [29.421443764865003]
We present an analysis indicating that what and where are entangled in the popular RoPE rotary position embedding.<n>We propose an improvement to RoPE, which we call Polar Coordinate Position Embeddings or PoPE, that eliminates the what-where confound.
arXiv Detail & Related papers (2025-09-05T14:22:27Z) - SeqPE: Transformer with Sequential Position Encoding [76.22159277300891]
SeqPE represents each $n$-dimensional position index as a symbolic sequence and employs a lightweight sequential position encoder to learn their embeddings.<n> Experiments across language modeling, long-context question answering, and 2D image classification demonstrate that SeqPE not only surpasses strong baselines in perplexity, exact match (EM) and accuracy--but also enables seamless generalization to multi-dimensional inputs without requiring manual architectural redesign.
arXiv Detail & Related papers (2025-06-16T09:16:40Z) - ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle Matrices [25.99231204405503]
We propose ComRoPE, which generalizes Rotary Positional PE (RoPE) by defining it in terms of trainable commuting angle matrices.<n>We present two types of trainable commuting angle matrices as sufficient solutions to the RoPE equation.<n>Our framework shows versatility in generalizing to existing RoPE formulations and offering new insights for future positional encoding research.
arXiv Detail & Related papers (2025-06-04T09:10:02Z) - Revisiting LRP: Positional Attribution as the Missing Ingredient for Transformer Explainability [53.21677928601684]
Layer-wise relevance propagation is one of the most promising approaches to explainability in deep learning.<n>We propose specialized theoretically-grounded LRP rules designed to propagate attributions across various positional encoding methods.<n>Our method significantly outperforms the state-of-the-art in both vision and NLP explainability tasks.
arXiv Detail & Related papers (2025-06-02T18:07:55Z) - PaTH Attention: Position Encoding via Accumulating Householder Transformations [56.32365080761523]
PaTH is a flexible data-dependent position encoding scheme based on accumulated products of Householder transformations.<n>We derive an efficient parallel algorithm for training through exploiting a compact representation of products of Householder matrices.
arXiv Detail & Related papers (2025-05-22T08:36:09Z) - Of All StrIPEs: Investigating Structure-informed Positional Encoding for Efficient Music Generation [1.3108652488669736]
We present a unified framework based on kernel methods to analyze both families of efficient PEs.<n>We develop a novel PE method called RoPE, capable of extracting causal relationships from temporal sequences.<n>For empirical validation, we use a symbolic music generation task, namely, melody harmonization.
arXiv Detail & Related papers (2025-04-07T11:51:29Z) - Toward Relative Positional Encoding in Spiking Transformers [52.62008099390541]
Spiking neural networks (SNNs) are bio-inspired networks that mimic how neurons in the brain communicate through discrete spikes.<n>We introduce several strategies to approximate relative positional encoding (RPE) in spiking Transformers.
arXiv Detail & Related papers (2025-01-28T06:42:37Z) - Real-Time Motion Prediction via Heterogeneous Polyline Transformer with
Relative Pose Encoding [121.08841110022607]
Existing agent-centric methods have demonstrated outstanding performance on public benchmarks.
We introduce the K-nearest neighbor attention with relative pose encoding (KNARPE), a novel attention mechanism allowing the pairwise-relative representation to be used by Transformers.
By sharing contexts among agents and reusing the unchanged contexts, our approach is as efficient as scene-centric methods, while performing on par with state-of-the-art agent-centric methods.
arXiv Detail & Related papers (2023-10-19T17:59:01Z) - Your Transformer May Not be as Powerful as You Expect [88.11364619182773]
We mathematically analyze the power of RPE-based Transformers regarding whether the model is capable of approximating any continuous sequence-to-sequence functions.
We present a negative result by showing there exist continuous sequence-to-sequence functions that RPE-based Transformers cannot approximate no matter how deep and wide the neural network is.
We develop a novel attention module, called Universal RPE-based (URPE) Attention, which satisfies the conditions.
arXiv Detail & Related papers (2022-05-26T14:51:30Z) - Conformer-based End-to-end Speech Recognition With Rotary Position
Embedding [11.428057887454008]
We introduce rotary position embedding (RoPE) in the convolution-augmented transformer (conformer)
RoPE encodes absolute positional information into the input sequence by a rotation matrix, and then naturally incorporates explicit relative position information into a self-attention module.
Our model achieves a relative word error rate reduction of 8.70% and 7.27% over the conformer on test-clean and test-other sets of the LibriSpeech corpus respectively.
arXiv Detail & Related papers (2021-07-13T08:07:22Z) - RoFormer: Enhanced Transformer with Rotary Position Embedding [9.01819510933327]
We propose a novel method named Rotary Position Embedding(RoPE) to effectively leverage the positional information.
RoPE encodes the absolute position with a rotation matrix and meanwhile incorporates the explicit relative position dependency in self-attention formulation.
We evaluate the enhanced transformer with rotary position embedding, also called RoFormer, on various long text classification benchmark datasets.
arXiv Detail & Related papers (2021-04-20T09:54:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.