Fugu-MT 論文翻訳(概要): JudgeFlow: Agentic Workflow Optimization via Block Judge

論文の概要: JudgeFlow: Agentic Workflow Optimization via Block Judge

arxiv url: http://arxiv.org/abs/2601.07477v1
Date: Mon, 12 Jan 2026 12:30:14 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-13 19:08:01.380855
Title: JudgeFlow: Agentic Workflow Optimization via Block Judge
Title（参考訳）: judgeFlow: ブロックジャッジによるエージェントワークフロー最適化
Authors: Zihan Ma, Zhikai Zhao, Chuanbo Hua, Federico Berto, Jinkyoo Park,
Abstract要約: 現在の手法は粗大でエンドツーエンドな評価信号に依存しており、どこで精製するかに関する微細な信号が欠如しており、しばしば非効率または低インパクトな修正をもたらす。本稿では,論理の基本形式を抽出し,問題のあるブロックにランクベースの責任スコアを割り当てる評価最適化更新パイプラインを提案する。提案手法は, サンプリング効率の向上, ブロックレベルの診断による解釈可能性の向上, ますます複雑化するエージェントの自動化のためのスケーラブルな基盤を提供する。
参考スコア（独自算出の注目度）: 25.427646436735312
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Optimizing LLM-based agentic workflows is challenging for scaling AI capabilities. Current methods rely on coarse, end-to-end evaluation signals and lack fine-grained signals on where to refine, often resulting in inefficient or low-impact modifications. To address these limitations, we propose {\our{}}, an Evaluation-Judge-Optimization-Update pipeline. We incorporate reusable, configurable logic blocks into agentic workflows to capture fundamental forms of logic. On top of this abstraction, we design a dedicated Judge module that inspects execution traces -- particularly failed runs -- and assigns rank-based responsibility scores to problematic blocks. These fine-grained diagnostic signals are then leveraged by an LLM-based optimizer, which focuses modifications on the most problematic block in the workflow. Our approach improves sample efficiency, enhances interpretability through block-level diagnostics, and provides a scalable foundation for automating increasingly complex agentic workflows. We evaluate {\our{}} on mathematical reasoning and code generation benchmarks, where {\our{}} achieves superior performance and efficiency compared to existing methods. The source code is publicly available at https://github.com/ma-zihan/JudgeFlow.
Abstract（参考訳）: LLMベースのエージェントワークフローの最適化は、AI能力をスケールアップする上で難しい。現在の手法は粗大でエンドツーエンドな評価信号に依存しており、どこで精製するかに関する微細な信号が欠如しており、しばしば非効率または低インパクトな修正をもたらす。これらの制限に対処するため、評価ジャッジ最適化更新パイプラインである {\our{}} を提案する。再利用可能な構成可能な論理ブロックをエージェントワークフローに組み込んで、基本的な論理形式を捉える。この抽象化の上に、実行トレース(特に実行が失敗した)を検査する専用のジャッジモジュールを設計し、ランクベースの責任スコアを問題のあるブロックに割り当てます。これらの微細な診断信号はLLMベースのオプティマイザによって利用され、ワークフローの最も問題のあるブロックに修正を集中する。提案手法は, サンプリング効率の向上, ブロックレベルの診断による解釈可能性の向上, 複雑化するエージェントワークフローの自動化のためのスケーラブルな基盤を提供する。数式推論とコード生成のベンチマークで {\our{}} を評価し,既存の手法よりも優れた性能と効率を実現する。ソースコードはhttps://github.com/ma-zihan/JudgeFlow.comで公開されている。

論文の概要: JudgeFlow: Agentic Workflow Optimization via Block Judge

関連論文リスト