Fugu-MT 論文翻訳(概要): SkillFlow: Flow-Driven Recursive Skill Evolution for Agentic Orchestration

論文の概要: SkillFlow: Flow-Driven Recursive Skill Evolution for Agentic Orchestration

arxiv url: http://arxiv.org/abs/2605.14089v1
Date: Wed, 13 May 2026 20:14:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-15 21:45:34.493584
Title: SkillFlow: Flow-Driven Recursive Skill Evolution for Agentic Orchestration
Title（参考訳）: SkillFlow: エージェントオーケストレーションのためのフロー駆動の再帰的スキル進化
Authors: Mingda Zhang, Tiesunlong Shen, Haoran Luo, Wenjin Liu, Zikai Xiao, Erik Cambria, Xiaoying Tang,
Abstract要約: SkillFlowは、トレーニング可能なスーパーバイザをエージェントとして、動的スキルオーケストレーションを備えた構造化環境として、フローベースのフレームワークである。これらのフロー診断に基づいて、スキル進化メカニズムは、いつ進化するか、どのスキルを創り出すか、どこで意思決定のギャップがあるかを決定する。
参考スコア（独自算出の注目度）: 40.79922760459963
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In recent years, a variety of powerful LLM-based agentic systems have been applied to automate complex tasks through task orchestration. However, existing orchestration methods still face key challenges, including strategy collapse under reward maximization, high gradient variance with opaque credit assignment, and unguided skill evolution whose decisions are typically made by directly prompting an LLM to judge rather than derived from principled training signals. To address these challenges, we propose SkillFlow, a flow-based framework that takes a trainable Supervisor as the agent and a structured environment with dynamic skill library and frozen executor, automating task orchestration through multi-turn interaction. SkillFlow employs Tempered Trajectory Balance (TTB), a regression-based flow-matching loss that samples trajectories proportional to reward, preserving diverse orchestration strategies rather than collapsing to a single mode. The same flow objective yields a jointly learned backward policy that provides transparent per-step credit assignment at zero additional inference cost. Building on these flow diagnostics, a recursive skill evolution mechanism determines when to evolve, what skills to create or prune, and where decision gaps lie -- closing the loop from training signal to autonomous capability growth. Experimental results on 14 datasets show that SkillFlow significantly outperforms baselines across question answering, mathematical reasoning, code generation, and real-world interactive decision making tasks. Our code is available at https://anonymous.4open.science/r/SkillFlow-E850.
Abstract（参考訳）: 近年,タスクオーケストレーションによる複雑なタスクの自動化に,強力なLLMベースのエージェントシステムが応用されている。しかしながら、既存のオーケストレーション手法では、報酬の最大化による戦略崩壊、不透明なクレジット割り当てによる高勾配のばらつき、原則的なトレーニング信号からではなく、LCMに直接判断するよう促すことによって決定される非ガイドスキルの進化など、大きな課題に直面している。これらの課題に対処するために、トレーニング可能なスーパーバイザをエージェントとして、動的スキルライブラリとフリーズエグゼキュータを備えた構造化環境として、マルチターンインタラクションによるタスクオーケストレーションを自動化するフローベースのフレームワークであるSkillFlowを提案する。 SkillFlowは、TTB(Tempered Trajectory Balance)という回帰ベースのフローマッチング損失を採用しており、単一のモードに崩壊するのではなく、さまざまなオーケストレーション戦略を保存する。同じフローの目的は、ゼロ追加の推論コストで透明なステップ単位のクレジット割り当てを提供する、共同で学習した後方ポリシーをもたらす。これらのフロー診断に基づいて、再帰的なスキル進化メカニズムは、いつ進化するか、どのスキルを創り出すか、どこで意思決定のギャップがあるかを決定します。 14のデータセットでの実験結果から、SkillFlowは質問応答、数学的推論、コード生成、実世界のインタラクティブな意思決定タスクにおいて、ベースラインを著しく上回ることがわかった。私たちのコードはhttps://anonymous.4open.science/r/SkillFlow-E850で利用可能です。

論文の概要: SkillFlow: Flow-Driven Recursive Skill Evolution for Agentic Orchestration

関連論文リスト