Fugu-MT 論文翻訳(概要): Can AI Models Direct Each Other? Organizational Structure as a Probe into Training Limitations

論文の概要: Can AI Models Direct Each Other? Organizational Structure as a Probe into Training Limitations

arxiv url: http://arxiv.org/abs/2603.26458v1
Date: Fri, 27 Mar 2026 14:27:45 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-30 21:49:48.53819
Title: Can AI Models Direct Each Other? Organizational Structure as a Probe into Training Limitations
Title（参考訳）: AIモデルは相互に指示できるか? - トレーニングの限界へのプローブとしての組織構造
Authors: Rui Liu,
Abstract要約: ManagerWorkerは2エージェントのパイプラインで、高価な"マネージャ"モデルが問題を分析し、調査タスクをディスパッチし、実装をレビューする一方、安価な"ワーカー"モデルはコード変更を実行する。 SWE-bench Liteの200インスタンスに対して、マネージャ-ワーカー関係、パイプラインの複雑さ、モデルペアリングが異なる5つの構成で評価します。
参考スコア（独自算出の注目度）: 3.303408763887703
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Can an expensive AI model effectively direct a cheap one to solve software engineering tasks? We study this question by introducing ManagerWorker, a two-agent pipeline where an expensive "manager" model (text-only, no code execution) analyzes issues, dispatches exploration tasks, and reviews implementations, while a cheap "worker" model (with full repo access) executes code changes. We evaluate on 200 instances from SWE-bench Lite across five configurations that vary the manager-worker relationship, pipeline complexity, and model pairing. Our findings reveal both the promise and the limits of multi-agent direction: (1) a strong manager directing a weak worker (62%) matches a strong single agent (60%) at a fraction of the strong-model token usage, showing that expensive reasoning can substitute for expensive execution; (2) a weak manager directing a weak worker (42%) performs worse than the weak agent alone (44%), demonstrating that the directing relationship requires a genuine capability gap--structure without substance is pure overhead; (3) the manager's value lies in directing, not merely reviewing--a minimal review-only loop adds just 2pp over the baseline, while structured exploration and planning add 11pp, showing that active direction is what makes the capability gap productive; and (4) these behaviors trace to a single root cause: current models are trained as monolithic agents, and splitting them into director/worker roles fights their training distribution. The pipeline succeeds by designing around this mismatch--keeping each model close to its trained mode (text generation for the manager, tool use for the worker) and externalizing organizational structure to code. This diagnosis points to concrete training gaps: delegation, scoped execution, and mode switching are skills absent from current training data.
Abstract（参考訳）: 高価なAIモデルは、ソフトウェアエンジニアリングの課題を解決するための安価なAIを効果的に導くことができるか? 高価な"マネージャ"モデル(テキストのみ、コード実行なし)が問題を分析し、調査タスクをディスパッチし、実装をレビューする一方で、安価な"ワーカー"モデル(完全なリポジトリアクセス付き)がコード変更を実行します。 SWE-bench Liteの200インスタンスに対して、マネージャ-ワーカー関係、パイプラインの複雑さ、モデルペアリングが異なる5つの構成で評価します。その結果,(1)弱い労働者を指示する強い管理者(62%)は,強い単一エージェント(60%)と高い単一エージェント(60%)に一致し,高価な推論に代えてコストがかかること,(2)弱い労働者を指示する弱いマネージャ(42%)は,弱いエージェント単独(44%)よりも悪い処理を行うこと,(3)指示関係は,物質を含まない真の能力的ギャップ構造を必要とすること,(3)最小限のレビュー専用ループがベースライン上でわずか2ppしか加算されないこと,(3)構造的探索と計画が111ppに留まること,(4) 有効方向が,そのギャップを生産する要因であることを示すこと,(4) これらの行動は,モノリシックモデルがトレーニング対象のエージェントとして訓練されること,の3つ,などを明らかにした。パイプラインは、このミスマッチを回避して、トレーニングされたモード(マネージャのテキスト生成、ワーカーのツール使用)に近い各モデルをメンテナンスし、コードに組織構造を外部化する。この診断は、デリゲート、スコープ化実行、モード切替といった具体的なトレーニングギャップが、現在のトレーニングデータから欠落していることを指摘する。

論文の概要: Can AI Models Direct Each Other? Organizational Structure as a Probe into Training Limitations

関連論文リスト