MrRoPE: Mixed-radix Rotary Position Embedding
- URL: http://arxiv.org/abs/2601.22181v1
- Date: Wed, 28 Jan 2026 05:09:54 GMT
- Title: MrRoPE: Mixed-radix Rotary Position Embedding
- Authors: Qingyuan Tian, Wenhong Zhu, Xiaoran Liu, Xiaofeng Wang, Rui Wang,
- Abstract summary: MrRoPE (Mixed-radix RoPE) is a general encoding formulation based on a radix system conversion perspective.<n>We introduce two training-free extensions, MrRoPE-Uni and MrRoPE-Pro, which leverage uniform and progressive radix conversion strategies.<n>MrRoPE-Pro sustains over 85% recall in the 128K-context Needle-in-a-Haystack test and achieves more than double YaRN's accuracy.
- Score: 15.874568186540076
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Rotary Position Embedding (RoPE)-extension refers to modifying or generalizing the Rotary Position Embedding scheme to handle longer sequences than those encountered during pre-training. However, current extension strategies are highly diverse and lack a unified theoretical foundation. In this paper, we propose MrRoPE (Mixed-radix RoPE), a generalized encoding formulation based on a radix system conversion perspective, which elegantly unifies various RoPE-extension approaches as distinct radix conversion strategies. Based on this theory, we introduce two training-free extensions, MrRoPE-Uni and MrRoPE-Pro, which leverage uniform and progressive radix conversion strategies, respectively, to achieve 'train short, test long' generalization. Without fine-tuning, MrRoPE-Pro sustains over 85% recall in the 128K-context Needle-in-a-Haystack test and achieves more than double YaRN's accuracy on Infinite-Bench retrieval and dialogue subsets. Theoretical analysis confirms that MrRoPE-Pro effectively raises the upper bound of RoPE's attainable encoding length, which further validates the reliability and utility of our theory and methodology.
Related papers
- CoPE: Clipped RoPE as A Scalable Free Lunch for Long Context LLMs [18.897130541385646]
Rotary Positional Embedding (RoPE) is a key component of context scaling in Large Language Models (LLMs)<n>In this work, we unify these seemingly distinct objectives through a minimalist intervention, namely soft clipping lowfrequency components of RoPE.<n>CoPE not only eliminates OOD outliers and refines semantic signals, but also prevents spectral leakage caused by hard clipping.
arXiv Detail & Related papers (2026-02-05T03:31:14Z) - Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs [72.8830548005884]
Rotary Position Embeddings (RoPE) have become a standard for encoding sequence order in Large Language Models.<n>Standard implementations utilize only the real component of the complex-valued dot product for attention score calculation.<n>We propose an extension that re-incorporates this imaginary component.
arXiv Detail & Related papers (2025-12-08T12:59:54Z) - RoSA: Enhancing Parameter-Efficient Fine-Tuning via RoPE-aware Selective Adaptation in Large Language Models [23.726452130486496]
Fine-tuning large language models is essential for task-specific adaptation, yet it remains computationally prohibitive.<n>We propose RoPE-aware Selective Adaptation (RoSA), a novel PEFT framework that allocates trainable parameters in a more targeted and effective manner.<n>RoSA comprises a RoPE-aware Attention Enhancement (RoAE) module, and a Dynamic Layer Selection (DLS) strategy that adaptively identifies and updates the most critical layers based on LayerNorm norms.
arXiv Detail & Related papers (2025-11-21T09:55:01Z) - A Circular Argument : Does RoPE need to be Equivariant for Vision? [45.33536249657655]
We mathematically show RoPE to be one of the most general solutions for equivariant positional embedding in one-dimensional data.<n>We propose Spherical RoPE, a method analogous to Mixed RoPE, but assumes non-commutative generators.
arXiv Detail & Related papers (2025-11-11T15:47:54Z) - Positional Encoding via Token-Aware Phase Attention [45.855203550592734]
We show that Rotary Positional Embedding (RoPE) introduces an intrinsic distance-dependent bias in attention scores that limits RoPE's ability to model long-context.<n>This paper introduces Token-Aware Phase Attention (TAPA), a new positional encoding method that incorporates a learnable phase function into the attention mechanism.
arXiv Detail & Related papers (2025-09-16T03:53:32Z) - Context-aware Rotary Position Embedding [0.0]
Rotary Positional Embeddings (RoPE) have become a widely adopted solution due to their compatibility with relative position encoding and computational efficiency.<n>We propose CARoPE (Context-Aware Rotary Positional Embedding), a novel generalization of RoPE that dynamically generates head-specific frequency patterns conditioned on token embeddings.<n>CaroPE consistently outperforms RoPE and other common positional encoding baselines, achieving significantly lower perplexity, even at longer context lengths.
arXiv Detail & Related papers (2025-07-30T20:32:19Z) - ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle Matrices [25.99231204405503]
We propose ComRoPE, which generalizes Rotary Positional PE (RoPE) by defining it in terms of trainable commuting angle matrices.<n>We present two types of trainable commuting angle matrices as sufficient solutions to the RoPE equation.<n>Our framework shows versatility in generalizing to existing RoPE formulations and offering new insights for future positional encoding research.
arXiv Detail & Related papers (2025-06-04T09:10:02Z) - Rethinking RoPE: A Mathematical Blueprint for N-dimensional Positional Embedding [1.8142288667655782]
We propose a systematic mathematical framework for Rotary Position Embedding (RoPE)<n>We derive the necessary and sufficient conditions for any valid $N$-dimensional RoPE based on two core properties of RoPE - relativity and reversibility.<n>Our framework unifies and explains existing RoPE designs while enabling principled extensions to higher-dimensional modalities and tasks.
arXiv Detail & Related papers (2025-04-07T21:58:22Z) - LongRoPE2: Near-Lossless LLM Context Window Scaling [46.936900701411965]
LongRoPE2 is a novel approach that extends the effective context window of pre-trained large language models (LLMs) to the target length.<n>This is achieved by three contributions: (1) a hypothesis that insufficient training in higher RoPE dimensions contributes to the persistent out-of-distribution issues observed in existing methods; (2) an effective RoPE rescaling algorithm that adopts evolutionary search guided by "needle-driven" perplexity to address the insufficient training problem; and (3) a mixed context window training approach that fine-tunes model weights to adopt rescaled RoPE for long-context sequences.
arXiv Detail & Related papers (2025-02-27T13:41:07Z) - VideoRoPE: What Makes for Good Video Rotary Position Embedding? [109.88966080843608]
VideoRoPE consistently surpasses previous RoPE variants, across diverse downstream tasks such as long video retrieval, video understanding, and video hallucination.<n>VideoRoPE features textlow-frequency temporal allocation to mitigate periodic oscillations, a textitdiagonal layout to maintain spatial symmetry, and textadjustable temporal spacing to decouple temporal and spatial indexing.
arXiv Detail & Related papers (2025-02-07T18:56:04Z) - Scaling Laws of RoPE-based Extrapolation [103.33995311915864]
We propose textbftextitScaling Laws of RoPE-based Extrapolation to describe the relationship between the extrapolation performance and base value.
We achieve extrapolation up to 1 million context length within only 16K training length on LLaMA2 7B and 13B.
arXiv Detail & Related papers (2023-10-08T15:50:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.