Fugu-MT 論文翻訳(概要): Learning an Adversarial World Model for Automated Curriculum Generation in MARL

論文の概要: Learning an Adversarial World Model for Automated Curriculum Generation in MARL

arxiv url: http://arxiv.org/abs/2509.03771v1
Date: Wed, 03 Sep 2025 23:32:39 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-05 20:21:09.99617
Title: Learning an Adversarial World Model for Automated Curriculum Generation in MARL
Title（参考訳）: MARLにおける自動カリキュラム生成のための逆世界モデル学習
Authors: Brennen Hill,
Abstract要約: 環境力学を推論し予測する世界モデルは、インテリジェンスを具現化する基礎となっている。真に一般化可能で堅牢なエージェントを開発するためには、エージェントの内部で学習するエージェントと並行して、複雑さを拡大する環境が必要です。本稿では、生成的**Attacker*エージェントが暗黙の世界モデルを学び、協調的**Defender*エージェントのチームにとってますます困難な課題を合成するシステムを提案する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: World models that infer and predict environmental dynamics are foundational to embodied intelligence. However, their potential is often limited by the finite complexity and implicit biases of hand-crafted training environments. To develop truly generalizable and robust agents, we need environments that scale in complexity alongside the agents learning within them. In this work, we reframe the challenge of environment generation as the problem of learning a goal-conditioned, generative world model. We propose a system where a generative **Attacker** agent learns an implicit world model to synthesize increasingly difficult challenges for a team of cooperative **Defender** agents. The Attacker's objective is not passive prediction, but active, goal-driven interaction: it models and generates world states (i.e., configurations of enemy units) specifically to exploit the Defenders' weaknesses. Concurrently, the embodied Defender team learns a cooperative policy to overcome these generated worlds. This co-evolutionary dynamic creates a self-scaling curriculum where the world model continuously adapts to challenge the decision-making policy of the agents, providing an effectively infinite stream of novel and relevant training scenarios. We demonstrate that this framework leads to the emergence of complex behaviors, such as the world model learning to generate flanking and shielding formations, and the defenders learning coordinated focus-fire and spreading tactics. Our findings position adversarial co-evolution as a powerful method for learning instrumental world models that drive agents toward greater strategic depth and robustness.
Abstract（参考訳）: 環境力学を推論し予測する世界モデルは、インテリジェンスを具現化する基礎となっている。しかしながら、それらのポテンシャルは、手作りのトレーニング環境の有限の複雑さと暗黙のバイアスによって制限されることが多い。真に一般化可能で堅牢なエージェントを開発するためには、エージェントの内部で学習するエージェントと並行して、複雑さを拡大する環境が必要です。本研究では,目標条件付き生成的世界モデル学習の課題として,環境生成の課題を再考する。本稿では、生成的**Attacker*エージェントが暗黙の世界モデルを学び、協調的**Defender*エージェントのチームにとってますます困難な課題を合成するシステムを提案する。アタッカーの目的は受動的予測ではなく、アクティブな目標駆動の相互作用であり、特にディフェンダーの弱点を利用するために世界国家(すなわち、敵部隊の構成)をモデル化し、生成する。同時に、具体化されたディフェンダーチームは、これらの生成された世界を克服するための協調政策を学ぶ。この共進化的ダイナミクスは、世界モデルがエージェントの意思決定方針に挑戦するために継続的に適応する自己スケーリングカリキュラムを作成し、事実上無限の新規および関連するトレーニングシナリオを提供する。我々は,この枠組みが,側面や遮蔽構造を生成する世界モデル学習や,集中砲火と拡散戦術の協調学習といった複雑な行動の出現につながることを実証した。本研究は, エージェントをより戦略的深度と堅牢性に導くための, 機器世界モデル学習の強力な方法として, 敵対的共進化を位置づけたものである。

論文の概要: Learning an Adversarial World Model for Automated Curriculum Generation in MARL

関連論文リスト