Fugu-MT 論文翻訳(概要): The Pitfalls of KV Cache Compression

論文の概要: The Pitfalls of KV Cache Compression

arxiv url: http://arxiv.org/abs/2510.00231v1
Date: Tue, 30 Sep 2025 19:55:26 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-03 16:59:20.233535
Title: The Pitfalls of KV Cache Compression
Title（参考訳）: KVキャッシュ圧縮の落とし穴
Authors: Alex Chen, Renato Geh, Aditya Grover, Guy Van den Broeck, Daniel Israel,
Abstract要約: 圧縮により、特定の命令がより高速に劣化することを示します。本稿では, 圧縮法, 命令順序, KV消去バイアスなど, 即時漏洩に果たすいくつかの要因について述べる。
参考スコア（独自算出の注目度）: 52.196873305708955
License: http://creativecommons.org/licenses/by/4.0/
Abstract: KV cache compression promises increased throughput and efficiency with negligible loss in performance. While the gains in throughput are indisputable and recent literature has indeed shown minimal degradation on particular benchmarks, in general the consequences of compression in realistic scenarios such as multi-instruction prompting have been insufficiently studied. In this paper, we identify several pitfalls practitioners should be aware of when deploying KV cache compressed LLMs. Importantly, we show that certain instructions degrade much more rapidly with compression, effectively causing them to be completely ignored by the LLM. As a practical example of that, we highlight system prompt leakage as a case study, empirically showing the impact of compression on leakage and general instruction following. We show several factors that play a role in prompt leakage: compression method, instruction order, and KV eviction bias. We then propose simple changes to KV cache eviction policies that can reduce the impact of these factors and improve the overall performance in multi-instruction tasks.
Abstract（参考訳）: KVキャッシュ圧縮によりスループットと効率が向上し、性能が低下する。スループットの上昇は議論の余地がなく、最近の文献では特定のベンチマークで最小限の劣化を示すが、一般的にはマルチインストラクションプロンプトのような現実的なシナリオにおける圧縮の結果は不十分に研究されている。本稿では,KVキャッシュ圧縮LDMのデプロイ時に,実践者が意識すべきいくつかの落とし穴を特定する。重要なことは、圧縮により特定の命令がより高速に劣化し、LLMによって完全に無視されることである。実例として,ケーススタディとしてシステムプロンプトリークに着目し,圧縮がリークに与える影響を実証的に示す。本稿では, 圧縮法, 命令順序, KV消去バイアスなど, 即時漏洩に果たすいくつかの要因について述べる。そこで我々は,これらの要因の影響を低減し,マルチインストラクションタスクにおける全体的な性能を向上させるため,KVキャッシュ消去ポリシーの簡易な変更を提案する。

論文の概要: The Pitfalls of KV Cache Compression

関連論文リスト