Fugu-MT 論文翻訳(概要): Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents

論文の概要: Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents

arxiv url: http://arxiv.org/abs/2606.12674v1
Date: Wed, 10 Jun 2026 21:01:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-12 15:55:27.460034
Title: Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents
Title（参考訳）: Evoflux: コンパクトエージェントのための実行可能なツールワークフローの推論時間進化
Authors: Kushal Raj Bhandari, Ling Yue, Ching-Yun Ko, Dhaval Patel, Shaowu Pan, Pin-Yu Chen, Jianxi Gao,
Abstract要約: 私たちは、小さなプランナがツールの解決、パラメータの検証、依存性の追跡、実行で失敗する、妥当なワークフローグラフを生成すると論じています。数百のトレースがワークフロー形式を教えることができるが、ツールカタログの変更による障害計画の修正に必要なリカバリ動作をカバーすることはめったにない。本稿では,ツールグラフの修復作業として,コンパクトツールの使用を取り扱う推論時進化探索手法であるEvofluxを紹介する。
参考スコア（独自算出の注目度）: 41.53691975342536
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Compact language models (LMs) reduce cost, latency, and deployment risk for tool agents. Yet MCP-style tool use requires more than isolated function calling: an agent must discover tools from live catalogs, satisfy schemas, preserve dependencies across intermediate outputs, and ground final responses in executed evidence. Small planners often generate plausible workflow graphs that fail under tool resolution, parameter validation, dependency tracking, or execution. We argue that this failure mode is poorly handled by small-corpus distillation. A few hundred teacher traces can teach workflow format, but rarely cover the recovery behavior needed to repair failed plans over changing tool catalogs. We introduce Evoflux, an inference-time evolutionary search method that treats compact tool use as the repair of executable tool workflows. It evolves typed workflow graphs through structured edits, execution feedback, adaptive intensity, meta-guided redesign, and diversity pruning. On held-out MCP-Bench tasks spanning live MCP servers and 250 tools, Evoflux raises execution feasibility from roughly 3% to 17-24% across small planners. In contrast, SFT and SFT+DPO on the same search-mined data match, underperform, or collapse below zero-shot performance; ReAct reaches higher peaks, but with higher variance and token cost. These results show that execution-grounded search is more reliable under scarce teacher-trace budgets.
Abstract（参考訳）: コンパクト言語モデル(LM)は、ツールエージェントのコスト、レイテンシ、デプロイメントリスクを低減する。エージェントはライブカタログからツールを発見し、スキーマを満足させ、中間出力にまたがって依存関係を保持し、実行されたエビデンスで最終応答を接地する必要がある。小さなプランナは、ツールの解決、パラメータの検証、依存性の追跡、実行で失敗する、妥当なワークフローグラフを生成することが多い。この故障モードは小口径蒸留ではうまく扱えないと我々は主張する。数百の教師トレースがワークフロー形式を教えることができるが、ツールカタログの変更に関する失敗計画の修復に必要な復旧動作をカバーすることはめったにない。本稿では,小型ツールを実行可能なツールワークフローの修復に利用する推論時間進化探索手法であるEvofluxを紹介する。構造化編集、実行フィードバック、アダプティブインテンシティ、メタガイダンスの再設計、ダイバーシティプルーニングを通じて、型付きワークフローグラフを進化させる。ライブのMCPサーバと250のツールにまたがるMCP-Benchタスクでは、Evofluxは小さなプランナに対して、実行可能性約3%から17-24%に向上する。対照的に、SFTとSFT+DPOは、ゼロショット性能以下で同じ検索マイニングされたデータマッチング、性能低下、あるいは崩壊する; ReActは高いピークに達するが、高いばらつきとトークンコストを持つ。これらの結果から,教師・トラス予算の不足下では,実行基盤探索がより信頼性が高いことが示唆された。

論文の概要: Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents

関連論文リスト