Fugu-MT 論文翻訳(概要): KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems

論文の概要: KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems

arxiv url: http://arxiv.org/abs/2510.12872v1
Date: Tue, 14 Oct 2025 18:00:01 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-16 20:13:28.37026
Title: KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems
Title（参考訳）: KVCOMM: 効率的なLLMベースのマルチエージェントシステムのためのオンラインクロスコンテキストKVキャッシュ通信
Authors: Hancheng Ye, Zhengqi Gao, Mingyuan Ma, Qinsi Wang, Yuzhe Fu, Ming-Yu Chung, Yueqian Lin, Zhijian Liu, Jianyi Zhang, Danyang Zhuo, Yiran Chen,
Abstract要約: KVCOMMは、マルチエージェント推論における効率的なプリフィルを可能にする、トレーニング不要のフレームワークである。 KVCOMMはキャッシュされたサンプル終端アンカーのプールを参照することにより、共有コンテンツのKVキャッシュを推定し、調整する。 KVCOMMは多様なマルチエージェントワークロード間で70%以上の再利用率を達成する。
参考スコア（独自算出の注目度）: 25.770173970846884
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-agent large language model (LLM) systems are increasingly adopted for complex language processing tasks that require communication and coordination among agents. However, these systems often suffer substantial overhead from repeated reprocessing of overlapping contexts across agents. In typical pipelines, once an agent receives a message from its predecessor, the full context-including prior turns-must be reprocessed from scratch, leading to inefficient processing. While key-value (KV) caching is an effective solution for avoiding redundant computation in single-agent settings where prefixes remain unchanged, it cannot be directly reused in multi-agent scenarios due to diverging prefixes introduced by agent-specific context extensions. We identify that the core challenge lies in the offset variance of KV-caches across agents. To address this, we propose KVCOMM, a training-free framework that enables efficient prefilling in multi-agent inference by reusing KV-caches and aligning cache offsets of overlapping contexts under diverse prefix contexts. KVCOMM estimates and adjusts KV-caches for shared content by referencing a pool of cached examples-termed anchors-that store observed cache deviations under varying prefixes. The anchor pool is maintained and updated online, allowing dynamic adaptation to distinct user requests and context structures. KVCOMM achieves over 70% reuse rate across diverse multi-agent workloads, including retrieval-augmented generation, math reasoning, and collaborative coding tasks, all without quality degradation. Particularly, when each fully-connected agent receives 1K input tokens with 512 prefix tokens and 512 output tokens under a five-agent setting, KVCOMM achieves up to 7.8x speedup compared to the standard prefill pipeline, reducing TTFT from ~430 ms to ~55 ms.
Abstract（参考訳）: マルチエージェント大規模言語モデル(LLM)システムは、エージェント間の通信と協調を必要とする複雑な言語処理タスクにますます採用されている。しかしながら、これらのシステムはエージェント間で重複するコンテキストの再処理によってかなりのオーバーヘッドを被ることが多い。典型的なパイプラインでは、エージェントが前者からメッセージを受け取ると、完全なコンテキストを含む以前のターンはスクラッチから再処理され、非効率な処理に繋がる。キー値(KV)キャッシングは、プレフィックスが変更されていない単一エージェント設定で冗長な計算を避けるための有効なソリューションであるが、エージェント固有のコンテキスト拡張によって導入されたプレフィックスのばらつきにより、マルチエージェントシナリオで直接再利用することはできない。中心となる課題は、エージェント間のKVカッチのオフセット分散にある。そこで我々は,KV-cachesを再利用し,重複するコンテキストのキャッシュオフセットを様々なプレフィックスコンテキスト下で整列させることにより,マルチエージェント推論における効率的なプリフィルを可能にするトレーニングフリーフレームワークであるKVCOMMを提案する。 KVCOMMは、キャッシュされたサンプル終端アンカーのプールを参照することにより、共有コンテンツのKVキャッシュを推定し、調整する。アンカープールはオンラインで維持および更新され、異なるユーザリクエストとコンテキスト構造への動的適応を可能にする。 KVCOMMは、検索強化世代、数学推論、協調コーディングタスクなど、さまざまなマルチエージェントワークロード間で70%以上の再利用率を達成する。特に、各完全接続エージェントが512のプレフィックストークンと512の出力トークンを持つ1K入力トークンを5エージェント設定で受信すると、KVCOMMは標準プリフィルパイプラインに比べて最大7.8倍のスピードアップを達成し、TTFTを430msから55msに短縮する。

論文の概要: KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems

関連論文リスト