Fugu-MT 論文翻訳(概要): Statistical Independence Aware Caching for LLM Workflows

論文の概要: Statistical Independence Aware Caching for LLM Workflows

arxiv url: http://arxiv.org/abs/2511.22118v1
Date: Thu, 27 Nov 2025 05:16:28 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-01 19:47:55.405893
Title: Statistical Independence Aware Caching for LLM Workflows
Title（参考訳）: LLMワークフローの統計的独立性を考慮したキャッシング
Authors: Yihan Dai, Dimitrios Stamatios Bouras, Haoxiang Jia, Sergey Mechtaev,
Abstract要約: 応答の局所キャッシュは、大規模言語モデル(LLM)推論のコストとレイテンシを低減するための実用的なソリューションを提供する。既存のLLMキャッシュシステムには、統計的独立性の制約を強制する方法がない。コンポーネントレベルでの統計的整合性を確保しつつ,モジュール式LLMをサポートするキャッシュ設計パターンであるMnimiを導入する。
参考スコア（独自算出の注目度）: 3.700239041804401
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) inference is both expensive and slow. Local caching of responses offers a practical solution to reduce the cost and latency of LLM queries. In research contexts, caching also enhances reproducibility and provides flexibility for experimentation. However, naive reuse of cached responses compromises statistical independence, a critical property for probabilistic workflows. In applications of LLM for code, it underpins performance metrics such as Pass@k and uncertainty estimation, as well as algorithms like program repair loops and retries. Existing LLM caching systems lack ways to enforce statistical independence constraints. To address this, we introduce Mnimi, a cache design pattern that supports modular LLM workflows while ensuring statistical integrity at the component level. Its core innovation lies in encapsulating statistical constraints within the type of LLM references, allowing users to manage and transform these types according to the scope and requirements of their algorithm. We implemented this design pattern in Python using a combination of decorators and iterators over infinite sequences. A case study on SpecFix, an recent automated program specification repair system, highlights how Mnimi improves reproducibility, ease of debugging, time and cost efficiency while preserving statistical correctness.
Abstract（参考訳）: 大規模言語モデル(LLM)推論は高価で遅い。レスポンスのローカルキャッシュは、LCMクエリのコストとレイテンシを低減するための実用的なソリューションを提供する。研究の文脈では、キャッシングは再現性を高め、実験に柔軟性を提供する。しかし、キャッシュされた応答の単純再利用は確率的ワークフローにとって重要な特性である統計的独立性を損なう。コードに対するLLMの応用においては、Pass@kや不確実性推定といったパフォーマンスメトリクスや、プログラムの修復ループやリトライのようなアルゴリズムを基盤としています。既存のLLMキャッシュシステムには、統計的独立性の制約を強制する方法がない。これを解決するために,モジュール式LLMワークフローをサポートするキャッシュ設計パターンであるMnimiを導入し,コンポーネントレベルでの統計的整合性を保証する。その中核となるイノベーションは、LSM参照のタイプに統計的制約をカプセル化することであり、ユーザはアルゴリズムのスコープと要求に応じてこれらのタイプを管理し、変換することができる。無限列上のデコレータとイテレータの組み合わせを用いて,この設計パターンをPythonで実装した。最近の自動プログラム仕様修正システムであるSpecFixのケーススタディでは、Mnimiが統計的正確性を維持しながら再現性、デバッグの容易さ、時間とコスト効率をどのように改善するかを強調している。

論文の概要: Statistical Independence Aware Caching for LLM Workflows

関連論文リスト