Fugu-MT 論文翻訳(概要): LEMON: Learning Executable Multi-Agent Orchestration via Counterfactual Reinforcement Learning

論文の概要: LEMON: Learning Executable Multi-Agent Orchestration via Counterfactual Reinforcement Learning

arxiv url: http://arxiv.org/abs/2605.14483v1
Date: Thu, 14 May 2026 07:24:09 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-15 21:45:34.68311
Title: LEMON: Learning Executable Multi-Agent Orchestration via Counterfactual Reinforcement Learning
Title（参考訳）: LEMON: 対実強化学習による実行可能マルチエージェントオーケストレーションの学習
Authors: Xudong Chen, Yixin Liu, Hua Wei, Kaize Ding,
Abstract要約: 大規模言語モデル(LLM)はマルチエージェントシステムの強力な基盤となっているが、その効果はオーケストレーション設計に大きく依存している。実行可能なオーケストレーション仕様を生成するオーケストレータであるLEMONを提案する。 MMLU、GSM8K、AQuA、MultiArith、SVAMP、HumanEvalを含む6つの推論およびコーディングベンチマークの実験は、LEMONが最先端のパフォーマンスを達成することを示す。
参考スコア（独自算出の注目度）: 31.870185345616733
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have become a strong foundation for multi-agent systems, but their effectiveness depends heavily on orchestration design. Across different tasks, role design, capacity assignment, and dependency construction jointly affect both solution quality and execution efficiency. Existing approaches automate parts of this design process, yet they often optimize these decisions partially or sequentially, and rely on execution-level feedback that provides limited credit assignment for local orchestration decisions. We propose LEMON (\textbf{L}earning \textbf{E}xecutable \textbf{M}ulti-agent \textbf{O}rchestratio\textbf{N} via Counterfactual Reinforcement Learning), an LLM-based orchestrator that generates an executable orchestration specification. The specification integrates task-specific roles, customized duties, capacity levels, and dependency structure into a single deployable system. To train the orchestrator, we augment the orchestration-level GRPO objective with a localized counterfactual signal that edits role, capacity, or dependency fields and applies the resulting reward contrast only to the edited spans. Experiments on six reasoning and coding benchmarks, including MMLU, GSM8K, AQuA, MultiArith, SVAMP, and HumanEval, show that LEMON achieves state-of-the-art performance among the evaluated multi-agent orchestration methods. Our code is available at https://anonymous.4open.science/r/LEMON-B23C.
Abstract（参考訳）: 大規模言語モデル(LLM)はマルチエージェントシステムの強力な基盤となっているが、その効果はオーケストレーション設計に大きく依存している。さまざまなタスク、役割設計、キャパシティ割り当て、依存関係構築は、ソリューションの品質と実行効率の両方に共同で影響を与えます。既存のアプローチは、この設計プロセスの一部を自動化するが、多くの場合、これらの決定を部分的にあるいは順次に最適化し、ローカルなオーケストレーション決定に限定的なクレジット割り当てを提供する実行レベルのフィードバックに依存する。実行可能なオーケストレーション仕様を生成するLLMベースのオーケストレータであるLEMON(\textbf{L}earning \textbf{E}xecutable \textbf{M}ulti-agent \textbf{O}rchestratio\textbf{N} via Counterfactual Reinforcement Learning)を提案する。この仕様では、タスク固有の役割、カスタマイズされた責務、キャパシティレベル、依存関係構造を単一のデプロイ可能なシステムに統合している。オーケストレータのトレーニングには,役割やキャパシティ,あるいは依存性のフィールドを編集し,結果として得られる報酬のコントラストを編集されたスパンにのみ適用する,局所的な反ファクト信号を用いて,オーケストレーションレベルのGRPO目標を拡張します。 MMLU、GSM8K、AQuA、MultiArith、SVAMP、HumanEvalを含む6つの推論およびコーディングベンチマークの実験は、LEMONが評価されたマルチエージェントオーケストレーション手法の中で最先端のパフォーマンスを達成することを示す。私たちのコードはhttps://anonymous.4open.science/r/LEMON-B23Cで利用可能です。

論文の概要: LEMON: Learning Executable Multi-Agent Orchestration via Counterfactual Reinforcement Learning

関連論文リスト