Related papers: MrRoPE: Mixed-radix Rotary Position Embedding

MrRoPE: Mixed-radix Rotary Position Embedding

URL: http://arxiv.org/abs/2601.22181v1
Date: Wed, 28 Jan 2026 05:09:54 GMT
Title: MrRoPE: Mixed-radix Rotary Position Embedding
Authors: Qingyuan Tian, Wenhong Zhu, Xiaoran Liu, Xiaofeng Wang, Rui Wang,
Abstract summary: MrRoPE (Mixed-radix RoPE) is a general encoding formulation based on a radix system conversion perspective.<n>We introduce two training-free extensions, MrRoPE-Uni and MrRoPE-Pro, which leverage uniform and progressive radix conversion strategies.<n>MrRoPE-Pro sustains over 85% recall in the 128K-context Needle-in-a-Haystack test and achieves more than double YaRN's accuracy.
Score: 15.874568186540076
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Rotary Position Embedding (RoPE)-extension refers to modifying or generalizing the Rotary Position Embedding scheme to handle longer sequences than those encountered during pre-training. However, current extension strategies are highly diverse and lack a unified theoretical foundation. In this paper, we propose MrRoPE (Mixed-radix RoPE), a generalized encoding formulation based on a radix system conversion perspective, which elegantly unifies various RoPE-extension approaches as distinct radix conversion strategies. Based on this theory, we introduce two training-free extensions, MrRoPE-Uni and MrRoPE-Pro, which leverage uniform and progressive radix conversion strategies, respectively, to achieve 'train short, test long' generalization. Without fine-tuning, MrRoPE-Pro sustains over 85% recall in the 128K-context Needle-in-a-Haystack test and achieves more than double YaRN's accuracy on Infinite-Bench retrieval and dialogue subsets. Theoretical analysis confirms that MrRoPE-Pro effectively raises the upper bound of RoPE's attainable encoding length, which further validates the reliability and utility of our theory and methodology.

Related papers

CoPE: Clipped RoPE as A Scalable Free Lunch for Long Context LLMs [18.897130541385646]
Rotary Positional Embedding (RoPE) is a key component of context scaling in Large Language Models (LLMs)<n>In this work, we unify these seemingly distinct objectives through a minimalist intervention, namely soft clipping lowfrequency components of RoPE.<n>CoPE not only eliminates OOD outliers and refines semantic signals, but also prevents spectral leakage caused by hard clipping.
arXiv Detail & Related papers (2026-02-05T03:31:14Z)
Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs [72.8830548005884]
Rotary Position Embeddings (RoPE) have become a standard for encoding sequence order in Large Language Models.<n>Standard implementations utilize only the real component of the complex-valued dot product for attention score calculation.<n>We propose an extension that re-incorporates this imaginary component.
arXiv Detail & Related papers (2025-12-08T12:59:54Z)
RoSA: Enhancing Parameter-Efficient Fine-Tuning via RoPE-aware Selective Adaptation in Large Language Models [23.726452130486496]
Fine-tuning large language models is essential for task-specific adaptation, yet it remains computationally prohibitive.<n>We propose RoPE-aware Selective Adaptation (RoSA), a novel PEFT framework that allocates trainable parameters in a more targeted and effective manner.<n>RoSA comprises a RoPE-aware Attention Enhancement (RoAE) module, and a Dynamic Layer Selection (DLS) strategy that adaptively identifies and updates the most critical layers based on LayerNorm norms.
arXiv Detail & Related papers (2025-11-21T09:55:01Z)
A Circular Argument : Does RoPE need to be Equivariant for Vision? [45.33536249657655]
We mathematically show RoPE to be one of the most general solutions for equivariant positional embedding in one-dimensional data.<n>We propose Spherical RoPE, a method analogous to Mixed RoPE, but assumes non-commutative generators.
arXiv Detail & Related papers (2025-11-11T15:47:54Z)
Positional Encoding via Token-Aware Phase Attention [45.855203550592734]
We show that Rotary Positional Embedding (RoPE) introduces an intrinsic distance-dependent bias in attention scores that limits RoPE's ability to model long-context.<n>This paper introduces Token-Aware Phase Attention (TAPA), a new positional encoding method that incorporates a learnable phase function into the attention mechanism.
arXiv Detail & Related papers (2025-09-16T03:53:32Z)
Context-aware Rotary Position Embedding [0.0]
Rotary Positional Embeddings (RoPE) have become a widely adopted solution due to their compatibility with relative position encoding and computational efficiency.<n>We propose CARoPE (Context-Aware Rotary Positional Embedding), a novel generalization of RoPE that dynamically generates head-specific frequency patterns conditioned on token embeddings.<n>CaroPE consistently outperforms RoPE and other common positional encoding baselines, achieving significantly lower perplexity, even at longer context lengths.
arXiv Detail & Related papers (2025-07-30T20:32:19Z)
ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle Matrices [25.99231204405503]
We propose ComRoPE, which generalizes Rotary Positional PE (RoPE) by defining it in terms of trainable commuting angle matrices.<n>We present two types of trainable commuting angle matrices as sufficient solutions to the RoPE equation.<n>Our framework shows versatility in generalizing to existing RoPE formulations and offering new insights for future positional encoding research.
arXiv Detail & Related papers (2025-06-04T09:10:02Z)
Rethinking RoPE: A Mathematical Blueprint for N-dimensional Positional Embedding [1.8142288667655782]
We propose a systematic mathematical framework for Rotary Position Embedding (RoPE)<n>We derive the necessary and sufficient conditions for any valid $N$-dimensional RoPE based on two core properties of RoPE - relativity and reversibility.<n>Our framework unifies and explains existing RoPE designs while enabling principled extensions to higher-dimensional modalities and tasks.
arXiv Detail & Related papers (2025-04-07T21:58:22Z)
LongRoPE2: Near-Lossless LLM Context Window Scaling [46.936900701411965]
LongRoPE2 is a novel approach that extends the effective context window of pre-trained large language models (LLMs) to the target length.<n>This is achieved by three contributions: (1) a hypothesis that insufficient training in higher RoPE dimensions contributes to the persistent out-of-distribution issues observed in existing methods; (2) an effective RoPE rescaling algorithm that adopts evolutionary search guided by "needle-driven" perplexity to address the insufficient training problem; and (3) a mixed context window training approach that fine-tunes model weights to adopt rescaled RoPE for long-context sequences.
arXiv Detail & Related papers (2025-02-27T13:41:07Z)
VideoRoPE: What Makes for Good Video Rotary Position Embedding? [109.88966080843608]
VideoRoPE consistently surpasses previous RoPE variants, across diverse downstream tasks such as long video retrieval, video understanding, and video hallucination.<n>VideoRoPE features textlow-frequency temporal allocation to mitigate periodic oscillations, a textitdiagonal layout to maintain spatial symmetry, and textadjustable temporal spacing to decouple temporal and spatial indexing.
arXiv Detail & Related papers (2025-02-07T18:56:04Z)
Scaling Laws of RoPE-based Extrapolation [103.33995311915864]
We propose textbftextitScaling Laws of RoPE-based Extrapolation to describe the relationship between the extrapolation performance and base value. We achieve extrapolation up to 1 million context length within only 16K training length on LLaMA2 7B and 13B.
arXiv Detail & Related papers (2023-10-08T15:50:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.