Fugu-MT 論文翻訳(概要): Polynomial Context-Truncation Sensitivity in Autoregressive Language Models: Sequential Wyner-Ziv Bounds for KV Cache Compression

論文の概要: Polynomial Context-Truncation Sensitivity in Autoregressive Language Models: Sequential Wyner-Ziv Bounds for KV Cache Compression

arxiv url: http://arxiv.org/abs/2605.25085v1
Date: Sun, 24 May 2026 13:54:13 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 19:50:18.749134
Title: Polynomial Context-Truncation Sensitivity in Autoregressive Language Models: Sequential Wyner-Ziv Bounds for KV Cache Compression
Title（参考訳）: 自己回帰型言語モデルにおける多項的文脈変化感度:KVキャッシュ圧縮のための逐次Wyner-Ziv境界
Authors: Munsik Kim,
Abstract要約: 自己回帰言語モデルにおけるオンラインKVキャッシュ圧縮の速度歪み限界について検討する。我々は,次点分布の文脈乱れに対する感受性が,エンフェロメトリーよりもエフェロメトリー的に崩壊することを発見した。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study the rate-distortion limits of online KV cache compression in autoregressive language models, formulating it as sequential Wyner-Ziv source coding on the filtration induced by the model, with the next-step query as decoder side information. Empirically, across four models spanning two families and $0.5$-$3$B parameters, we find that the next-token distribution's sensitivity to context truncation decays \emph{polynomially} rather than \emph{geometrically}: a power law improves on an exponential fit by an order of magnitude in extrapolation, the fitted exponent is recovered independently from a sink-plus-recent KL measurement, and the decay is verified to be free of positional-encoding artifacts by a position-preserving ablation. Under a corresponding \emph{polynomial truncation-sensitivity} assumption, our main result characterizes the per-token memory requirement of \emph{suffix-only} cache policies: a sliding-window scheme attains distortion $\varepsilon$ with window $w = O(\varepsilon^{-1/α})$, and -- under an additional two-sided Bayes-risk condition -- a converse shows $w = Ω(\varepsilon^{-1/α})$ is necessary within this policy class, so the scaling is $Θ(\varepsilon^{-1/α})$ for suffix-only policies. Whether recurrent or propagating cache summaries can beat this scaling is left open. An explicit block-Markov scheme achieves the upper bound; its rate-of-convergence exponent matches the converse under additional forward-decay and regularity hypotheses (not implied by truncation sensitivity alone), and differs by a factor of two otherwise. Empirically, the polynomial law predicts the degradation curves of concrete cache policies: recency-based eviction (sliding, sink-plus-recent) suppresses distortion by roughly two orders of magnitude over random retention at equal budget, with a power-law decay in the budget.
Abstract（参考訳）: 自動回帰言語モデルにおけるオンラインKVキャッシュ圧縮の速度歪み限界について検討し,次段階のクエリをデコーダ側情報として,モデルによって誘導されるフィルタ上の逐次Wyner-Zivソース符号化として定式化する。実験的に、2つの族にまたがる4つのモデルと0.5$-3$Bのパラメータにまたがって、次の分布の文脈トランケーション崩壊に対する感度は、emph{geometrically}ではなく「emph{polynomially}」である。ウィンドウ$w = O(\varepsilon^{-1/α})$と -- 追加の2辺のBayes-risk条件 -- 逆は$w = Ω(\varepsilon^{-1/α})$である。このスケーリングを繰り返すか、キャッシュサマリーを伝播させるかは、未解決のままです。明示的なブロック-マルコフスキームは上限を達成し、その収束率指数は、追加の前方デカイと正則性仮説(トランニケート感度だけでは含まない)の下で逆と一致し、他の2つの因子によって異なる。多項式法則は、コンクリートのキャッシュポリシーの劣化曲線を予測している: 傾きに基づく消去(スライディング、シンク・プラス・レセント)は、同じ予算でランダムな保持よりも約2桁の歪みを抑える。

論文の概要: Polynomial Context-Truncation Sensitivity in Autoregressive Language Models: Sequential Wyner-Ziv Bounds for KV Cache Compression

関連論文リスト