Fugu-MT 論文翻訳(概要): Can I Buy Your KV Cache?

論文の概要: Can I Buy Your KV Cache?

arxiv url: http://arxiv.org/abs/2606.13361v1
Date: Thu, 11 Jun 2026 13:47:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-12 15:55:27.829081
Title: Can I Buy Your KV Cache?
Title（参考訳）: KVキャッシュは買えますか?
Authors: Luoyuan Zhang,
Abstract要約: 私たちは、ほとんど攻撃的に単純である提案を作成します。発行者がドキュメントのKVキャッシュをプリコンパイルし、他のすべてのエージェントがそれをロードしてプリフィルをスキップする権利を購入させる。 Qwen3-4Bでは、再利用はプリフィルよりも9-50倍安い。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Right now, across the world, AI agents are repeating the same absurd act: to read one document, they each recompute it from scratch. Every agent re-runs prefill, the most compute-intensive step a large model takes, over identical text, only to rebuild a key-value (KV) cache identical to the one the agent before it just built. The same answer, computed a million times. We make a proposal that is almost offensively simple: compute it once. Let a publisher precompute a document's KV cache, and let every other agent buy the right to load it and skip prefill. It works, and it is token-exact: loading a precomputed KV and continuing matches prefilling from scratch (24/24 greedy tokens, and at the logits level), with no accuracy cost. On Qwen3-4B, reuse is 9-50x cheaper in compute than prefill, and the gap widens with length (prefill's attention scales with L^2), so a single reuse already pays it back. Then the part that matters: where the KV lives. Shipping it fails, because KV is nearly incompressible, so per-load egress costs more than the prefill it saves. Hosting it provider-side, exactly as production prompt-caching works, removes egress entirely. The size of the prize is set by our measured compute saving: serving one hot 3774-token document to 80M agents costs ~$1.5M to re-prefill but only ~$0.03M of reuse compute (49.7x less). The 0.1x cache-read tariff APIs charge passes a 10x discount to users while sitting inside this measured envelope, so the 10x is a floor that the measured ~50x compute saving clears, and the gap to the physical ~50x is provider margin: millions of dollars per popular document. We frame the resulting agent-native prefill CDN and leave lossless KV compression and a cross-party payment layer as the open problems.
Abstract（参考訳）: 現在、世界中のAIエージェントが、同じ不条理な行為を繰り返している。すべてのエージェントがプリフィルを再実行し、大きなモデルが同じテキストを超える最も計算集約的なステップは、エージェントが構築したばかりのものと同一のキー値(KV)キャッシュを再構築することです。同じ答えが100万回計算された。私たちは、ほとんど攻撃的に単純である提案を作成します。発行者がドキュメントのKVキャッシュをプリコンパイルし、他のすべてのエージェントがそれをロードしてプリフィルをスキップする権利を購入させる。プリコンパイルされたKVをロードし、スクラッチからプレフィルする(24/24のグレディトークンとロジットレベルで)。 Qwen3-4Bでは、再利用はプリフィルよりも9-50倍安くなり、ギャップは長さ(プリフィルの注意はL^2)で拡大するので、単一の再利用で既に返済できる。そして重要なのは、KVがどこに住んでいるかだ。 KVは圧縮不能に近いため,ロード毎のエグレスコストは,保存するプリフィルよりも高いため,出荷は失敗する。運用プロンプトキャッシュが動作するように、プロバイダ側でホストすることで、エクスレスを完全に取り除くことができる。 1つのホットな3774の文書を8000万のエージェントに提供し、再処理には$1.5Mかかるが、再利用計算には$0.03M(49.7倍)しかかからない。 0.1倍のキャッシュ読み取り関税API料金は、この測定された封筒の中に座っている間、ユーザに10倍の割引を渡します。得られたエージェントネイティブプリフィルCDNをフレーム化し、損失のないKV圧縮とサードパーティ支払い層をオープンな問題として残す。

論文の概要: Can I Buy Your KV Cache?

関連論文リスト