Fugu-MT 論文翻訳(概要): Divide and Cooperate: Role-Decomposed Multi-Agent LLM Training with Cross-Agent Learning Signals

論文の概要: Divide and Cooperate: Role-Decomposed Multi-Agent LLM Training with Cross-Agent Learning Signals

arxiv url: http://arxiv.org/abs/2606.10684v1
Date: Tue, 09 Jun 2026 10:40:55 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-10 15:40:58.4493
Title: Divide and Cooperate: Role-Decomposed Multi-Agent LLM Training with Cross-Agent Learning Signals
Title（参考訳）: ディバイドと協調:クロスエージェント学習信号を用いたロール分解型マルチエージェントLDMトレーニング
Authors: Jaewan Park, Solbee Cho, Jay-Yoon Lee,
Abstract要約: 既存のアプローチは一つのポリシーの中で証拠の取得と答えの生成を2つに分けますエージェント検索を2つのサブタスクに分割するロール分解型マルチエージェントトレーニングフレームワークであるDAC(Divide and Cooperate)を提案する。生成装置は、解答生成元と証拠充足検証器の両方として二重の役割を担い、検索された証拠が不十分な場合に停止する。
参考スコア（独自算出の注目度）: 10.378290102256534
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern language agents which perform multi-step reasoning have shown strong performance in knowledge-intensive question answering. However, existing approaches typically couple evidence acquisition and answer generation within a single policy. This forces a single model to play multiple potentially conflicting roles, inducing a combinatorial explosion in the policy space and hindering efficient exploration. It also introduces a credit assignment problem during training: a search action that retrieves sufficient evidence may still be penalized when generation fails, and vice versa. We propose DAC (Divide and Cooperate), a role-decomposed multi-agent training framework that divides agentic search into two cooperative subtasks, each handled by a dedicated agent trained with role-specific learning signals. The generator serves a dual role as both an answer producer and an evidence sufficiency verifier, abstaining when retrieved evidence is insufficient. This abstention signal is incorporated into the search agent's reward, providing structured cross-agent learning signals that improve credit assignment. Conversely, the searcher exposes the generator to diverse and challenging evidence environments by hard-positive evidence augmentation, improving its robustness. Experiments on general and multi-hop QA benchmarks show that DAC, implemented via parameter-efficient LoRA modules over a shared backbone, achieves strong performance against prior baselines that rely on full fine-tuning of monolithic models.
Abstract（参考訳）: 多段階推論を行う現代言語エージェントは,知識集約型質問応答において高い性能を示した。しかし、既存のアプローチは典型的には、一つのポリシーの中で証拠の取得と回答の生成を2つに分けている。これにより、単一のモデルが複数の潜在的に矛盾する役割を担わざるを得なくなり、政策空間における組合せ的爆発を引き起こし、効率的な探索を妨げる。十分な証拠を検索する検索アクションは、世代が失敗してもペナルティ化され、その逆もできる。 DAC(Divide and Cooperate)は、エージェント検索を2つの協調サブタスクに分割し、それぞれがロール固有の学習信号で訓練された専用エージェントによって処理されるロール分解型マルチエージェントトレーニングフレームワークである。生成装置は、解答生成元と証拠充足検証器の両方として二重の役割を担い、検索された証拠が不十分な場合に停止する。この棄権信号は、検索エージェントの報奨に組み込まれ、クレジット割当を改善する構造化されたクロスエージェント学習信号を提供する。逆に、探索者は、強硬なエビデンス増強により、多様で挑戦的なエビデンス環境にジェネレータを公開し、その堅牢性を向上させる。一般的なマルチホップQAベンチマークの実験では、DACはパラメータ効率のよいLoRAモジュールによって共有バックボーン上に実装され、モノリシックモデルの完全な微調整に依存する以前のベースラインに対して強力なパフォーマンスを実現する。

論文の概要: Divide and Cooperate: Role-Decomposed Multi-Agent LLM Training with Cross-Agent Learning Signals

関連論文リスト