Fugu-MT 論文翻訳(概要): Cost-Aware Speculative Execution for LLM-Agent Workflows: An Integrated Five-Dimension Method

論文の概要: Cost-Aware Speculative Execution for LLM-Agent Workflows: An Integrated Five-Dimension Method

arxiv url: http://arxiv.org/abs/2606.07846v1
Date: Fri, 05 Jun 2026 21:13:47 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-09 14:42:05.484122
Title: Cost-Aware Speculative Execution for LLM-Agent Workflows: An Integrated Five-Dimension Method
Title（参考訳）: LLM-Agentワークフローに対するコストアウェアな投機的実行:統合5次元法
Authors: Faisal Fareed,
Abstract要約: 投機的実行は、予測上流入力で下流操作を起動することでアイドル時間を再利用することができる。ここでは、各投機は実際の費用(単価単価)を負担し、その成功確率は見積もりが困難で、時間の経過とともにドリフトする。本稿では,(D1)上流が完成する前に下流での運用を開始する,(D2)各投機を個別の入出力レートで価格設定する,(D3)単一演算子ダイヤルをレイテンシ対コストで公開する,(D4)障害重み付きコスト項と優先調整しきい値による期待値ルールによる決定,()
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: LLM-agent workflows chain model calls and tool invocations, and spend most of their wall-clock time waiting on upstream operations before downstream ones can start. Speculative execution can reclaim that idle time by launching a downstream operation with a predicted upstream input, but here each speculation costs real money (per-token billing) and its success probability is hard to estimate and drifts over time. This paper presents a method organized around five design decisions: (D1) start a downstream operation before its upstream completes; (D2) price each speculation in real dollars at separate input and output rates; (D3) expose a single operator dial for latency versus cost; (D4) decide via an expected-value rule with a failure-weighted cost term and a preference-adjusted threshold; and (D5) estimate the success probability with a Bayesian Beta-Binomial posterior whose prior is keyed to a dependency-type taxonomy. Variants of these ideas appear in recent work; the combination, with every decision logged in dollars, is what is new. The rule fires only on edges passing an admissibility precondition (side-effect-free, idempotent, or stageable behind a commit barrier), since a wrong speculation is rolled back by re-execution, which refunds tokens but cannot un-send an irreversible side effect. We specify the runtime mechanics, a closed-form result that the rule self-limits as the upstream branching factor grows, a five-stage calibration pipeline (offline replay, shadow, canary, online calibration, drift-triggered kill-switch), and a workload-fit rubric over eight production archetypes. Contrast tables against the four closest published systems (DSP, Speculative Actions v2, Sherlock, B-PASTE) show differentiators on every dimension, and a synthetic validation suite confirms the predicted decision boundary, probability threshold, posterior recovery, and streaming-cancellation behavior.
Abstract（参考訳）: LLM-agentワークフローは、モデル呼び出しとツール呼び出しをチェーンし、ダウンストリームの呼び出しを開始する前に、そのウォールタイムのほとんどを上流のオペレーションで待機する。投機的実行は、予測上流入力で下流操作を起動することで、アイドル時間を取り戻すことができるが、ここでは各投機が実際の費用(単価単価)を負担し、その成功確率を見積もることは困難であり、時間の経過とともにドリフトする。本稿では, (D1) 上流での操業開始前に, (D2) それぞれの投機を個別の入出力レートで価格設定すること, (D3) 単一演算子ダイヤルをレイテンシ対コストで公開すること, (D4) 故障重み付きコスト項と優先調整しきい値を用いた期待値ルールで決定すること, (D5) 先行する依存型分類に鍵をおくベイズ型ベータ・ビノミアル後部で成功確率を推定すること, の5つの設計上の決定について述べる。これらのアイデアのバリエーションは、近年の作業に現れており、ドルに記録されたすべての決定と組み合わせることが、新しいことです。不正な憶測は再実行によってロールバックされるため、トークンを返却するが、不可逆的な副作用を排除できない。実行時機構,上流分岐因子としての自己限界が増大するクローズドフォーム,5段階キャリブレーションパイプライン(オフラインリプレイ,シャドウ,カナリア,オンラインキャリブレーション,ドリフトトリガー式キルスウィッチ),および8つの生産アーティファクト上のワークロード適合ルーブリックを規定する。最も近い4つのシステム(DSP, Speculative Actions v2, Sherlock, B-PASTE)に対する比較表は、各次元の微分器を示し、合成検証スイートは予測された決定境界、確率閾値、リカバリ、ストリーミングキャンセラ動作を確認する。

論文の概要: Cost-Aware Speculative Execution for LLM-Agent Workflows: An Integrated Five-Dimension Method

関連論文リスト