Fugu-MT 論文翻訳(概要): Generative Caching for Structurally Similar Prompts and Responses

論文の概要: Generative Caching for Structurally Similar Prompts and Responses

arxiv url: http://arxiv.org/abs/2511.17565v1
Date: Fri, 14 Nov 2025 00:22:00 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-07 19:06:32.282536
Title: Generative Caching for Structurally Similar Prompts and Responses
Title（参考訳）: 構造的に類似したプロンプトと応答のための生成キャッシング
Authors: Sarthak Chakraborty, Suman Nath, Xuchao Zhang, Chetan Bansal, Indranil Gupta,
Abstract要約: 大きな言語モデル(LLM)は、様々なシナリオでタスクを計画、推論、実行するためにますます使われています。リピータブルやエージェントの設定のようなユースケースでは、プロンプトは小さなバリエーションで再利用されることが多い。構造的に類似したプロンプトに対して変動認識応答を生成する生成キャッシュである ourmethod を導入する。
参考スコア（独自算出の注目度）: 15.50345473013337
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are increasingly being used to plan, reason, and execute tasks across diverse scenarios. In use cases like repeatable workflows and agentic settings, prompts are often reused with minor variations while having a similar structure for recurring tasks. This opens up opportunities for caching. However, exact prompt matching fails on such structurally similar prompts, while semantic caching may produce incorrect responses by ignoring critical differences. To address this, we introduce \ourmethod{}, a generative cache that produces variation-aware responses for structurally similar prompts. \ourmethod{} identifies reusable response patterns across similar prompt structures and synthesizes customized outputs for new requests. We show that \ourmethod{} achieves 83\% cache hit rate, while having minimal incorrect hits on datasets without prompt repetition. In agentic workflows, it improves cache hit rate by $\sim$20\% and reduces end-to-end execution latency by $\sim$34\% compared to standard prompt matching.
Abstract（参考訳）: 大きな言語モデル(LLM)は、様々なシナリオでタスクを計画、推論、実行するためにますます使われています。リピータブルワークフローやエージェント設定のようなユースケースでは、プロンプトは小さなバリエーションで再利用されることが多いが、同じ構造でタスクを繰り返すことができる。これによりキャッシングの機会が開ける。しかし、正確なプロンプトマッチングはそのような構造的に類似したプロンプトでは失敗するが、セマンティックキャッシングは重要な違いを無視して誤った応答を生成する可能性がある。これを解決するために,構造的に類似したプロンプトに対して変動認識応答を生成する生成キャッシュである \ourmethod{} を導入する。 \ourmethod{}は、同様のプロンプト構造にまたがる再利用可能な応答パターンを特定し、新しいリクエストのためにカスタマイズされた出力を合成する。ここでは,<ourmethod{} が 83% のキャッシュヒット率を達成する一方で,データセットに対するミスマッチが最小限であることを示す。エージェントワークフローでは、キャッシュヒット率を$\sim$20\%改善し、標準のプロンプトマッチングと比較して、エンドツーエンドの実行遅延を$\sim$34\%削減する。

論文の概要: Generative Caching for Structurally Similar Prompts and Responses

関連論文リスト