Rotary Positional Embeddings as Phase Modulation: Theoretical Bounds on the RoPE Base for Long-Context Transformers
- URL: http://arxiv.org/abs/2602.10959v1
- Date: Wed, 11 Feb 2026 15:50:07 GMT
- Title: Rotary Positional Embeddings as Phase Modulation: Theoretical Bounds on the RoPE Base for Long-Context Transformers
- Authors: Feilong Liu,
- Abstract summary: Rotary positional embeddings (RoPE) are widely used in large language models to encode token positions.<n>We derive principled lower bounds on the RoPE base parameter that are necessary to preserve positional coherence over a target context length.<n>We extend this analysis to deep transformers, showing that repeated rotary modulation across layers compounds angular misalignment.<n>Together, the lower and upper bounds define a precision- and depth-dependent feasibility region a Goldilocks zone for long-context transformers.
- Score: 0.5414847001704249
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Rotary positional embeddings (RoPE) are widely used in large language models to encode token positions through multiplicative rotations, yet their behavior at long context lengths remains poorly characterized. In this work, we reinterpret RoPE as phase modulation applied to a bank of complex oscillators, enabling analysis through classical signal processing theory. Under this formulation, we derive principled lower bounds on the RoPE base parameter that are necessary to preserve positional coherence over a target context length. These include a fundamental aliasing bound, analogous to a Nyquist limit, and a DC-component stability bound that constrains phase drift in low-frequency positional modes. We further extend this analysis to deep transformers, showing that repeated rotary modulation across layers compounds angular misalignment, tightening the base requirement as depth increases. Complementing these results, we derive a precision-dependent upper bound on the RoPE base arising from finite floating-point resolution. Beyond this limit, incremental phase updates become numerically indistinguishable, leading to positional erasure even in the absence of aliasing. Together, the lower and upper bounds define a precision- and depth-dependent feasibility region a Goldilocks zone for long-context transformers. We validate the framework through a comprehensive case study of state-of-the-art models, including LLaMA, Mistral, and DeepSeek variants, showing that observed successes, failures, and community retrofits align closely with the predicted bounds. Notably, models that violate the stability bound exhibit attention collapse and long-range degradation, while attempts to scale beyond one million tokens encounter a hard precision wall independent of architecture or training.
Related papers
- From Sparse Sensors to Continuous Fields: STRIDE for Spatiotemporal Reconstruction [3.2580743227673694]
We present STRIDE, a framework that maps high-dimensional spatial fields to a latent state with a temporaltemporal decoder.<n>We show that STRIDE supports super-resolution, supports super-resolution, and remains robust to noise.
arXiv Detail & Related papers (2026-02-04T04:39:23Z) - Spiral RoPE: Rotate Your Rotary Positional Embeddings in the 2D Plane [49.14270539697387]
Spiral RoPE is a simple yet effective extension that enables multi-directional positional encoding.<n>Across a wide range of vision tasks including classification, segmentation, and generation, Spiral RoPE consistently improves performance.
arXiv Detail & Related papers (2026-02-03T07:56:58Z) - Spectral Embedding via Chebyshev Bases for Robust DeepONet Approximation [0.6752538702870791]
SpectralEmbedded DeepONet (SEDNet) is a new variant in which the trunk is driven by a fixed Chebyshev spectral dictionary rather than coordinate inputs.<n>SEDNet consistently achieves the lowest relative L2 errors among DeepONet, FEDONet, and SEDONet with average improvements of about 30-40% over the baseline DeepONet.
arXiv Detail & Related papers (2025-12-09T22:26:29Z) - One Attention, One Scale: Phase-Aligned Rotary Positional Embeddings for Mixed-Resolution Diffusion Transformer [48.30024190686566]
Cross-Resolution Phase-Aligned Attention (CRPA) is a training-free drop-in fix that eliminates this failure at its source.<n>CRPA is fully compatible with pretrained DiTs, stabilizes all heads and layers uniformly.<n>We demonstrate that CRPA enables high-fidelity and efficient mixed-resolution generation, outperforming previous state-of-the-art methods on image and video generation.
arXiv Detail & Related papers (2025-11-24T23:10:15Z) - Morphing Through Time: Diffusion-Based Bridging of Temporal Gaps for Robust Alignment in Change Detection [51.56484100374058]
We introduce a modular pipeline that improves spatial and temporal robustness without altering existing change detection networks.<n>A diffusion module synthesizes intermediate morphing frames that bridge large appearance gaps, enabling RoMa to estimate stepwise correspondences.<n>Experiments on LEVIR-CD, WHU-CD, and DSIFN-CD show consistent gains in both registration accuracy and downstream change detection.
arXiv Detail & Related papers (2025-11-11T08:40:28Z) - Drift No More? Context Equilibria in Multi-Turn LLM Interactions [58.69551510148673]
contexts drift is the gradual divergence of a model's outputs from goal-consistent behavior across turns.<n>Unlike single-turn errors, drift unfolds temporally and is poorly captured by static evaluation metrics.<n>We show that multi-turn drift can be understood as a controllable equilibrium phenomenon rather than as inevitable decay.
arXiv Detail & Related papers (2025-10-09T04:48:49Z) - Rethinking RoPE Scaling in Quantized LLM: Theory, Outlier, and Channel-Band Analysis with Weight Rescaling [3.7391437252721698]
We show that combining RoPE position-aware (PI) with PTQ degrades accuracy due to effects including long-context aliasing, dynamic-range dilation, anisotropy from axis-aligned quantizers vs rotated RoPE pairs.<n>We propose Q-ROAR (Quantization, RoPE-interpolation, and Outlier Aware Rescaling), a weight-only, awareness-aware stabilization of PI for quantized LLMs.
arXiv Detail & Related papers (2025-09-26T01:23:32Z) - Positional Encoding via Token-Aware Phase Attention [45.855203550592734]
We show that Rotary Positional Embedding (RoPE) introduces an intrinsic distance-dependent bias in attention scores that limits RoPE's ability to model long-context.<n>This paper introduces Token-Aware Phase Attention (TAPA), a new positional encoding method that incorporates a learnable phase function into the attention mechanism.
arXiv Detail & Related papers (2025-09-16T03:53:32Z) - HoPE: Hyperbolic Rotary Positional Encoding for Stable Long-Range Dependency Modeling in Large Language Models [19.3827288035483]
We propose Hyperbolic Rotary Positional.<n>(HoPE) which leverages hyperbolic functions to implement Lorentz rotations on token representations.<n>Tests show HoPE consistently exceeds existing positional encoding methods.
arXiv Detail & Related papers (2025-09-05T16:20:48Z) - Deep Reinforcement Learning for IRS Phase Shift Design in
Spatiotemporally Correlated Environments [93.30657979626858]
We propose a deep actor-critic algorithm that accounts for channel correlations and destination motion.
We show that, when channels aretemporally correlated, the inclusion of the SNR in the state representation with function approximation in ways that inhibit convergence.
arXiv Detail & Related papers (2022-11-02T22:07:36Z) - Beyond the Edge of Stability via Two-step Gradient Updates [49.03389279816152]
Gradient Descent (GD) is a powerful workhorse of modern machine learning.
GD's ability to find local minimisers is only guaranteed for losses with Lipschitz gradients.
This work focuses on simple, yet representative, learning problems via analysis of two-step gradient updates.
arXiv Detail & Related papers (2022-06-08T21:32:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.