Related papers: Q-ROAR: Outlier-Aware Rescaling for RoPE Position Interpolation in Quantized Long-Context LLMs

Q-ROAR: Outlier-Aware Rescaling for RoPE Position Interpolation in Quantized Long-Context LLMs

URL: http://arxiv.org/abs/2509.14391v1
Date: Wed, 17 Sep 2025 19:50:16 GMT
Title: Q-ROAR: Outlier-Aware Rescaling for RoPE Position Interpolation in Quantized Long-Context LLMs
Authors: Ye Qiao, Sitao Huang,
Abstract summary: We show that combining PI with PTQ degrades accuracy due to coupled effects long context aliasing, dynamic range dilation, axis grid anisotropy, and shifting that induce position-dependent logit noise.<n>We propose Q-ROAR, a RoPE-aware, weight-only stabilization that groups RoPE dimensions into a few frequency bands and performs a small search over per-band scales for W_Q,W_K, with an optional symmetric variant to preserve logit scale.
Score: 0.9510848451801044
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Extending LLM context windows is crucial for long range tasks. RoPE-based position interpolation (PI) methods like linear and frequency-aware scaling extend input lengths without retraining, while post-training quantization (PTQ) enables practical deployment. We show that combining PI with PTQ degrades accuracy due to coupled effects long context aliasing, dynamic range dilation, axis grid anisotropy, and outlier shifting that induce position-dependent logit noise. We provide the first systematic analysis of PI plus PTQ and introduce two diagnostics: Interpolation Pressure (per-band phase scaling sensitivity) and Tail Inflation Ratios (outlier shift from short to long contexts). To address this, we propose Q-ROAR, a RoPE-aware, weight-only stabilization that groups RoPE dimensions into a few frequency bands and performs a small search over per-band scales for W_Q,W_K, with an optional symmetric variant to preserve logit scale. The diagnostics guided search uses a tiny long-context dev set and requires no fine-tuning, kernel, or architecture changes. Empirically, Q-ROAR recovers up to 0.7% accuracy on standard tasks and reduces GovReport perplexity by more than 10%, while preserving short-context performance and compatibility with existing inference stacks.

Related papers

PAS: A Training-Free Stabilizer for Temporal Encoding in Video LLMs [57.790910044227935]
Video LLMs suffer from temporal inconsistency: small shifts in frame timing can flip attention and suppress relevant frames.<n>We present Phase Aggregated Smoothing (PAS), a training-free mechanism that applies small opposed phase offsets across heads and then aggregates their outputs.<n>Our analysis shows that the RoPE rotated logit can be approximated as a content dot product scaled by a time kernel; smoothing this kernel yields Lipschitz stability of attention to small temporal shifts; multi phase averaging attenuates high frequency ripples while preserving per-head spectra under Nyquist-valid sampling.
arXiv Detail & Related papers (2025-11-14T05:56:47Z)
DoPE: Denoising Rotary Position Embedding [60.779039511252584]
Rotary Position Embedding (RoPE) in Transformer models has inherent limits that weaken length.<n>We reinterpret the attention map with positional encoding as a noisy feature map, and propose Denoising Positional extrapolation page (DoPE)<n>DoPE is a training-free method based on truncated matrix entropy to detect outlier frequency bands in the feature map.
arXiv Detail & Related papers (2025-11-12T09:32:35Z)
Rethinking RoPE Scaling in Quantized LLM: Theory, Outlier, and Channel-Band Analysis with Weight Rescaling [3.7391437252721698]
We show that combining RoPE position-aware (PI) with PTQ degrades accuracy due to effects including long-context aliasing, dynamic-range dilation, anisotropy from axis-aligned quantizers vs rotated RoPE pairs.<n>We propose Q-ROAR (Quantization, RoPE-interpolation, and Outlier Aware Rescaling), a weight-only, awareness-aware stabilization of PI for quantized LLMs.
arXiv Detail & Related papers (2025-09-26T01:23:32Z)
Positional Encoding via Token-Aware Phase Attention [62.1265709014944]
We show that Rotary Positional Embedding (RoPE) introduces an intrinsic distance-dependent bias in attention scores that limits RoPE's ability to model long-context.<n>This paper introduces Token-Aware Phase Attention (TAPA), a new positional encoding method that incorporates a learnable phase function into the attention mechanism.
arXiv Detail & Related papers (2025-09-16T03:53:32Z)
HoPE: Hyperbolic Rotary Positional Encoding for Stable Long-Range Dependency Modeling in Large Language Models [19.3827288035483]
We propose Hyperbolic Rotary Positional.<n>(HoPE) which leverages hyperbolic functions to implement Lorentz rotations on token representations.<n>Tests show HoPE consistently exceeds existing positional encoding methods.
arXiv Detail & Related papers (2025-09-05T16:20:48Z)
Decoupling the "What" and "Where" With Polar Coordinate Positional Embeddings [29.421443764865003]
We present an analysis indicating that what and where are entangled in the popular RoPE rotary position embedding.<n>We propose an improvement to RoPE, which we call Polar Coordinate Position Embeddings or PoPE, that eliminates the what-where confound.
arXiv Detail & Related papers (2025-09-05T14:22:27Z)
LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training [45.74983991122073]
Large language models (LLMs) experience significant performance degradation when the input exceeds the pretraining context window.<n>Recent studies mitigate this problem by remapping OOD positions into the in-distribution range with fixed mapping strategies.<n>We propose Length-aware Multi-grained Positional Scaling (LaMPE), a training-free method that fully utilizes the model's effective context window.
arXiv Detail & Related papers (2025-08-04T11:22:13Z)
SeqPE: Transformer with Sequential Position Encoding [76.22159277300891]
SeqPE represents each $n$-dimensional position index as a symbolic sequence and employs a lightweight sequential position encoder to learn their embeddings.<n> Experiments across language modeling, long-context question answering, and 2D image classification demonstrate that SeqPE not only surpasses strong baselines in perplexity, exact match (EM) and accuracy--but also enables seamless generalization to multi-dimensional inputs without requiring manual architectural redesign.
arXiv Detail & Related papers (2025-06-16T09:16:40Z)
HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation [19.42279057349193]
positional encodings (PEs) are designed to exhibit long-term decay, based on an entrenched and long-standing inductive opinion.<n>We argue that long-term decay is outdated in the era of LLMs, as LLMs are now applied to tasks demanding precise retrieval of in-context information.
arXiv Detail & Related papers (2024-10-28T17:01:52Z)
Scaling Laws of RoPE-based Extrapolation [103.33995311915864]
We propose textbftextitScaling Laws of RoPE-based Extrapolation to describe the relationship between the extrapolation performance and base value. We achieve extrapolation up to 1 million context length within only 16K training length on LLaMA2 7B and 13B.
arXiv Detail & Related papers (2023-10-08T15:50:36Z)
Deep Reinforcement Learning for IRS Phase Shift Design in Spatiotemporally Correlated Environments [93.30657979626858]
We propose a deep actor-critic algorithm that accounts for channel correlations and destination motion. We show that, when channels aretemporally correlated, the inclusion of the SNR in the state representation with function approximation in ways that inhibit convergence.
arXiv Detail & Related papers (2022-11-02T22:07:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.