Fugu-MT 論文翻訳(概要): Sherlock: Reliable and Efficient Agentic Workflow Execution

論文の概要: Sherlock: Reliable and Efficient Agentic Workflow Execution

arxiv url: http://arxiv.org/abs/2511.00330v1
Date: Sat, 01 Nov 2025 00:17:57 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-05 16:37:26.720378
Title: Sherlock: Reliable and Efficient Agentic Workflow Execution
Title（参考訳）: Sherlock: 信頼性と効率的なエージェントワークフローの実行
Authors: Yeonju Ro, Haoran Qiu, Íñigo Goiri, Rodrigo Fonseca, Ricardo Bianchini, Aditya Akella, Zhangyang Wang, Mattan Erez, Esha Choukse,
Abstract要約: 大規模言語モデル(LLM)は、従来のアプリケーションを置き換える傾向にある。あるステップにおける不正または部分的に正しい出力は、その後の段階を通じて伝播または増幅することができる。すべてのステップを検証することは、大きなレイテンシとコストオーバーヘッドをもたらす。提案手法であるSherlockは,エージェントの反実解析を用いて,エラー発生ノードを同定し,コスト最適検証器を選択的にアタッチする。
参考スコア（独自算出の注目度）: 44.30588192569476
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the increasing adoption of large language models (LLM), agentic workflows, which compose multiple LLM calls with tools, retrieval, and reasoning steps, are increasingly replacing traditional applications. However, such workflows are inherently error-prone: incorrect or partially correct output at one step can propagate or even amplify through subsequent stages, compounding the impact on the final output. Recent work proposes integrating verifiers that validate LLM output or actions, such as self-reflection, debate, or LLM-as-a-judge mechanisms. Yet, verifying every step introduces significant latency and cost overheads. In this work, we seek to answer three key questions: which nodes in a workflow are most error-prone and thus deserve costly verification, how to select the most appropriate verifier for each node, and how to use verification with minimal impact to latency? Our solution, Sherlock, addresses these using counterfactual analysis on agentic workflows to identify error-prone nodes and selectively attaching cost-optimal verifiers only where necessary. At runtime, Sherlock speculatively executes downstream tasks to reduce latency overhead, while verification runs in the background. If verification fails, execution is rolled back to the last verified output. Compared to the non-verifying baseline, Sherlock delivers an 18.3% accuracy gain on average across benchmarks. Sherlock reduces workflow execution time by up to 48.7% over non-speculative execution and lowers verification cost by 26.0% compared to the Monte Carlo search-based method, demonstrating that principled, fault-aware verification effectively balances efficiency and reliability in agentic workflows.
Abstract（参考訳）: 大規模言語モデル(LLM)の採用の増加に伴い、ツールや検索、推論ステップを備えた複数のLLMコールを構成するエージェントワークフローが、従来のアプリケーションを置き換える傾向にある。しかし、そのようなワークフローは本質的にエラーを起こしやすく、あるステップにおける不正または部分的に正しい出力は、その後の段階を通じて伝播または増幅することができ、最終的な出力への影響を複雑にする。近年の研究では、自己回帰、議論、LSM-as-a-judge機構などのLCMの出力や動作を検証する検証器の統合が提案されている。しかし、すべてのステップを検証することは、大きなレイテンシとコストオーバーヘッドをもたらす。この作業では、ワークフロー内のどのノードが最もエラーを起こしやすいため、コストのかかる検証に値するか、各ノードに最適な検証方法、レイテンシへの影響を最小限に抑えた検証方法、という3つの重要な質問に答えようとしています。当社のソリューションであるSherlockは、エージェントワークフローの反ファクト分析を用いて、エラー発生ノードを特定し、必要な場合にのみコスト最適検証を選択的にアタッチする。実行時に、Sherlockはダウンストリームタスクを投機的に実行し、レイテンシーのオーバーヘッドを低減し、検証はバックグラウンドで実行される。検証が失敗した場合、実行は最後に確認された出力にロールバックされる。非検証ベースラインと比較して、Sherlockはベンチマーク平均18.3%の精度向上を実現している。 Sherlockは、非投機的実行よりもワークフローの実行時間を最大48.7%削減し、モンテカルロ検索ベースの手法と比較して検証コストを26.0%削減する。

論文の概要: Sherlock: Reliable and Efficient Agentic Workflow Execution

関連論文リスト