Fugu-MT 論文翻訳(概要): Coordinated Diffusion: Generating Multi-Agent Behavior Without Multi-Agent Demonstrations

論文の概要: Coordinated Diffusion: Generating Multi-Agent Behavior Without Multi-Agent Demonstrations

arxiv url: http://arxiv.org/abs/2605.11485v2
Date: Thu, 14 May 2026 22:04:54 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-18 17:44:16.239046
Title: Coordinated Diffusion: Generating Multi-Agent Behavior Without Multi-Agent Demonstrations
Title（参考訳）: コーディネート拡散:マルチエージェントのデモを伴わないマルチエージェント動作の生成
Authors: Lasse Peters, Laura Ferranti, Andrea Bajcsy, Javier Alonso-Mora,
Abstract要約: Coordinated Diffusionは、ユーザ定義のマルチエージェントコスト関数を通じて独立に訓練された単一エージェント拡散ポリシーを結合するフレームワークである。この誘導項は勾配のない方法で推定でき、CoDiをブラックボックスで微分不可能なコスト関数に適用できることを示す。両腕操作タスクのシミュレーションおよびハードウェア実験の結果,CoDiは単一エージェントデータからロバストな協調動作を検出することがわかった。
参考スコア（独自算出の注目度）: 24.24743540676481
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Imitation learning powered by generative models has proven effective for modeling complex single-agent behaviors. However, teaching multi-agent systems, like multiple arms or vehicles, to coordinate through imitation learning is hindered by a fundamental data bottleneck: as the joint state-action space grows exponentially with the number of agents, collecting a sufficient amount of coordinated multi-agent demonstrations becomes extremely costly. In this work, we ask: how can we leverage single-agent demonstration data to learn multi-agent policies? We present Coordinated Diffusion (CoDi), a framework that couples independently trained single-agent diffusion policies through a user-defined multi-agent cost function, without requiring any coordinated demonstrations. We derive a new diffusion-based sampling scheme wherein the diffusion score function decomposes into independent, single-agent pre-trained base policies plus a cost-driven guidance term that coordinates these base policies into cohesive multi-agent behavior. We show that this guidance term can be estimated in a gradient-free manner, making CoDi applicable to black-box, non-differentiable cost functions without additional training. Theoretically and empirically, we analyze the conditions under which this composition can faithfully approximate a target multi-agent behavior. We find a complementary role for demonstration data versus the cost function: single-agent demonstrations must cover the support of the desired multi-agent behavior, while the cost function must promote desired behavior from this product of single-agent policies. Our results in simulation and hardware experiments of a two-arm manipulation task show that CoDi discovers robust coordinated behavior from single-agent data, is more data-efficient than multi-agent baselines, and highlights the importance of joint guidance, base policy support, and cost design.
Abstract（参考訳）: 生成モデルを利用した模倣学習は複雑な単一エージェントの振る舞いをモデル化するのに有効であることが証明されている。しかし、複数のアームや車両のようなマルチエージェントシステムに模倣学習による協調を指導することは、基本的なデータボトルネックによって妨げられる: 共同状態アクション空間はエージェントの数とともに指数関数的に増加するにつれて、十分な量の協調型マルチエージェントのデモンストレーションを集めるのは非常にコストがかかる。この作業では、マルチエージェントポリシーを学ぶために、どのようにシングルエージェントのデモデータを活用すればよいのか? 協調拡散(CoDi)は,ユーザ定義のマルチエージェントコスト関数を通じて独立に訓練された単一エージェント拡散ポリシーを,協調的なデモンストレーションを必要とせずに結合するフレームワークである。拡散スコア関数を独立した単エージェント事前学習ベースポリシーに分解し,これらの基本ポリシーを結合的マルチエージェント動作に調整するコスト駆動型ガイダンス項を導出する。我々は,この誘導項を勾配のない方法で推定し,余分な訓練を伴わずに,ブラックボックスで微分不可能なコスト関数に適用できることを示した。理論的,実証的に,対象とするマルチエージェントの挙動を忠実に近似できる条件を解析する。単エージェントのデモは、望まれるマルチエージェント動作のサポートをカバーしなければなりませんが、コスト関数は、この製品から望まれる単一エージェントポリシーの行動を促進する必要があります。両腕操作タスクのシミュレーションおよびハードウェア実験の結果,CoDiは単一エージェントデータからロバストな協調動作を発見し,マルチエージェントベースラインよりもデータ効率が高く,共同ガイダンス,ベースポリシーサポート,コスト設計の重要性を強調した。

論文の概要: Coordinated Diffusion: Generating Multi-Agent Behavior Without Multi-Agent Demonstrations

関連論文リスト