Fugu-MT 論文翻訳(概要): CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing

論文の概要: CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing

arxiv url: http://arxiv.org/abs/2508.16134v1
Date: Fri, 22 Aug 2025 06:55:45 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-25 16:42:36.275601
Title: CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing
Title（参考訳）: CommonKV: クロス層パラメータ共有によるKVキャッシュ圧縮
Authors: Yixuan Wang, Haoyu Qiao, Lujun Li, Qingfu Zhu, Wanxiang Che,
Abstract要約: CommonKVは、隣接パラメータ共有による層間KVキャッシュ圧縮のトレーニング不要な方法である。提案手法は,様々な圧縮比で既存の低ランクおよびクロスレイヤーの手法より一貫して優れていることを示す。
参考スコア（独自算出の注目度）: 54.34080239841088
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large Language Models (LLMs) confront significant memory challenges due to the escalating KV cache with increasing sequence length. As a crucial technique, existing cross-layer KV cache sharing methods either necessitate modified model architectures with subsequent pre-training or incur significant performance degradation at high compression rates. To mitigate these challenges, we propose CommonKV, a training-free method for cross-layer KV cache compression through adjacent parameters sharing. Inspired by the high similarity observed in cross-layer hidden states, we utilize Singular Value Decomposition (SVD) to achieve weight sharing across adjacent parameters, resulting in a more easily mergeable latent KV cache. Furthermore, we also introduce an adaptive budget allocation strategy. It dynamically assigns compression budgets based on cosine similarity, ensuring that dissimilar caches are not over-compressed. Experiments across multiple backbone models and benchmarks including LongBench and Ruler demonstrate that the proposed method consistently outperforms existing low-rank and cross-layer approaches at various compression ratios. Moreover, we find that the benefits of CommonKV are orthogonal to other quantization and eviction methods. By integrating these approaches, we can ultimately achieve a 98\% compression ratio without significant performance loss.
Abstract（参考訳）: 大規模言語モデル(LLM)は、シーケンス長の増大によるKVキャッシュのエスカレートによるメモリの問題に直面する。重要な手法として、既存の層間KVキャッシュ共有手法は、後続の事前学習を伴う修正モデルアーキテクチャを必要とするか、高い圧縮速度で大幅な性能劣化を引き起こすかのいずれかである。これらの課題を緩和するために、隣接パラメータ共有による層間KVキャッシュ圧縮のためのトレーニング不要なCommonKVを提案する。層間隠れ状態における高い類似性から着想を得て,Singular Value Decomposition (SVD) を用いて,隣接するパラメータ間の重み共有を実現し,より簡単にマージ可能な潜伏KVキャッシュを実現する。また、適応的な予算配分戦略も導入する。圧縮予算をコサイン類似度に基づいて動的に割り当て、異種キャッシュが過剰に圧縮されないようにする。 LongBench や Ruler など,複数のバックボーンモデルおよびベンチマークを用いた実験により,提案手法は様々な圧縮比で既存の低ランクおよびクロスレイヤーアプローチより一貫して優れていることを示した。さらに,CommonKVの利点は他の量子化法や消去法と直交していることが判明した。これらの手法を統合することで、最終的に98 %の圧縮比を大きな性能損失なく達成できる。

論文の概要: CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing

関連論文リスト