Fugu-MT 論文翻訳(概要): PiKV: KV Cache Management System for Mixture of Experts

論文の概要: PiKV: KV Cache Management System for Mixture of Experts

arxiv url: http://arxiv.org/abs/2508.06526v1
Date: Sat, 02 Aug 2025 03:50:14 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-12 21:23:28.408009
Title: PiKV: KV Cache Management System for Mixture of Experts
Title（参考訳）: PiKV:KVキャッシュ管理システム
Authors: Dong Liu, Yanxuan Yu, Ben Lengerich, Ying Nian Wu, Xuhong Wang,
Abstract要約: キーバリュー(KV)キャッシュストレージは、マルチGPUとマルチノード推論において大きなボトルネックとなっている。我々は,MoEアーキテクチャに適した並列分散KVキャッシュサービスフレームワークである textbfPiKV を紹介する。 PiKVは、まだ生きたプロジェクトであり、MoE Architecturesの総合的なKVキャッシュ管理システムになることを目指している。
参考スコア（独自算出の注目度）: 35.172826570994815
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As large language models continue to scale up in both size and context length, the memory and communication cost of key-value (KV) cache storage has become a major bottleneck in multi-GPU and multi-node inference. While MoE-based architectures sparsify computation across experts, the corresponding KV caches remain dense and globally synchronized, resulting in significant overhead. We introduce \textbf{PiKV}, a parallel and distributed KV cache serving framework tailored for MoE architecture. PiKV leverages \textit{expert-sharded KV storage} to partition caches across GPUs, \textit{PiKV routing} to reduce token-to-KV access, and a \textit{PiKV Scheduling} to adaptively retain query-relevant entries. To further reduce memory usage, PiKV integrates \textit{PiKV Compression} modules the caching pipeline for acceleration. PiKV is recently publicly available as an open-source software library: \href{https://github.com/NoakLiu/PiKV}{https://github.com/NoakLiu/PiKV}. Experiments details is recorded at: \href{https://github.com/NoakLiu/PiKV/blob/main/downstream_tasks/README.md}{https://github.com/NoakLiu/PiKV/Experimental\_Results}. We also have PiKV integrated with Nvidia kvpress for acceleration, details see \href{https://github.com/NoakLiu/PiKVpress}{https://github.com/NoakLiu/PiKVpress}. PiKV is still a living project, aiming to become a comprehesive KV Cache management system for MoE Architectures.
Abstract（参考訳）: 大規模言語モデルがサイズとコンテキスト長の両方でスケールアップを続けるにつれ、キー値(KV)キャッシュストレージのメモリと通信コストは、マルチGPUとマルチノード推論において大きなボトルネックとなっている。 MoEベースのアーキテクチャは専門家間で計算を分散させるが、対応するKVキャッシュは密集し、グローバルに同期し、大きなオーバーヘッドをもたらす。 MoE アーキテクチャに適した並列分散 KV キャッシュサービスフレームワークである \textbf{PiKV} を紹介する。 PiKVは、GPU間でキャッシュを分割するために \textit{expert-sharded KV Storage}、トークンからKVへのアクセスを減らすために \textit{PiKV routing}、クエリ関連エントリを適応的に保持するために \textit{PiKV Scheduling} を利用する。メモリ使用量をさらに削減するため、PiKVでは、アクセラレーション用のキャッシュパイプラインを‘textit{PiKV Compression}モジュールに統合している。 PiKVは先日,オープンソースソフトウェアライブラリとして公開された。 \href{https://github.com/NoakLiu/PiKV}{https://github.com/NoakLiu/PiKV}。実験の詳細は以下の通りである。 \href{https://github.com/NoakLiu/PiKV/blob/downstream_tasks/README.md}{https://github.com/NoakLiu/PiKV/experimental\_Results}。また、アクセラレーションのためにNvidia kvpressとPiKVを統合しています。詳細は、href{https://github.com/NoakLiu/PiKVpress}{https://github.com/NoakLiu/PiKVpress}を参照してください。 PiKVは現在も生きたプロジェクトであり、MoEアーキテクチャー向けの総合的なKVキャッシュ管理システムになることを目指している。

論文の概要: PiKV: KV Cache Management System for Mixture of Experts

関連論文リスト