Fugu-MT 論文翻訳(概要): Co-Evolving Latent Action World Models

論文の概要: Co-Evolving Latent Action World Models

arxiv url: http://arxiv.org/abs/2510.26433v1
Date: Thu, 30 Oct 2025 12:28:40 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-31 16:05:09.807724
Title: Co-Evolving Latent Action World Models
Title（参考訳）: 協調進化型潜在行動世界モデル
Authors: Yucen Wang, Fengming Zhang, De-Chuan Zhan, Li Zhao, Kaixin Wang, Jiang Bian,
Abstract要約: 学習済みのビデオモデルを潜在アクションを介して制御可能な世界モデルに適応させることは、ジェネラリストの世界モデルを作成するための有望なステップである。本稿では,この相乗的パラダイムを初めて実現したCoLA-Worldを提案する。世界モデルは知識のある家庭教師として機能し、高品質のLAMを形成するための勾配を提供する。
参考スコア（独自算出の注目度）: 57.48921576959243
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Adapting pre-trained video generation models into controllable world models via latent actions is a promising step towards creating generalist world models. The dominant paradigm adopts a two-stage approach that trains latent action model (LAM) and the world model separately, resulting in redundant training and limiting their potential for co-adaptation. A conceptually simple and appealing idea is to directly replace the forward dynamic model in LAM with a powerful world model and training them jointly, but it is non-trivial and prone to representational collapse. In this work, we propose CoLA-World, which for the first time successfully realizes this synergistic paradigm, resolving the core challenge in joint learning through a critical warm-up phase that effectively aligns the representations of the from-scratch LAM with the pre-trained world model. This unlocks a co-evolution cycle: the world model acts as a knowledgeable tutor, providing gradients to shape a high-quality LAM, while the LAM offers a more precise and adaptable control interface to the world model. Empirically, CoLA-World matches or outperforms prior two-stage methods in both video simulation quality and downstream visual planning, establishing a robust and efficient new paradigm for the field.
Abstract（参考訳）: 学習済みのビデオ生成モデルを潜在アクションを介して制御可能な世界モデルに適応させることは、ジェネラリストの世界モデルを作成するための有望なステップである。支配的なパラダイムは、潜在アクションモデル(LAM)と世界モデルを個別に訓練する2段階のアプローチを採用しており、結果として冗長なトレーニングが行われ、コ適応の可能性を制限する。概念的に単純で魅力的な考え方は、LAMのフォワード・ダイナミック・モデルを直接強力な世界モデルに置き換えて、それらを共同で訓練することであるが、非自明で表現的崩壊の傾向にある。本研究では,この相乗的パラダイムを初めて実現したCoLA-Worldを提案する。本研究は,LAMと事前学習された世界モデルとの表象を効果的に整合させる重要なウォームアップフェーズを通じて,共同学習における中核的課題を解決するものである。世界モデルは知識のある家庭教師として機能し、高品質のLAMを形成するための勾配を提供する一方、LAMはより正確で適応可能な制御インターフェースを世界モデルに提供する。実証的に、CoLA-Worldは、ビデオシミュレーションの品質と下流の視覚計画の両方において、以前の2段階の手法に適合または優れ、この分野の堅牢で効率的な新しいパラダイムを確立する。

論文の概要: Co-Evolving Latent Action World Models

関連論文リスト