Co-Evolving Latent Action World Models
- URL: http://arxiv.org/abs/2510.26433v1
- Date: Thu, 30 Oct 2025 12:28:40 GMT
- Title: Co-Evolving Latent Action World Models
- Authors: Yucen Wang, Fengming Zhang, De-Chuan Zhan, Li Zhao, Kaixin Wang, Jiang Bian,
- Abstract summary: Adapting pre-trained video models into controllable world models via latent actions is a promising step towards creating generalist world models.<n>We propose CoLA-World, which for the first time successfully realizes this synergistic paradigm.<n>This unlocks a co-evolution cycle: the world model acts as a knowledgeable tutor, providing gradients to shape a high-quality LAM.
- Score: 57.48921576959243
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adapting pre-trained video generation models into controllable world models via latent actions is a promising step towards creating generalist world models. The dominant paradigm adopts a two-stage approach that trains latent action model (LAM) and the world model separately, resulting in redundant training and limiting their potential for co-adaptation. A conceptually simple and appealing idea is to directly replace the forward dynamic model in LAM with a powerful world model and training them jointly, but it is non-trivial and prone to representational collapse. In this work, we propose CoLA-World, which for the first time successfully realizes this synergistic paradigm, resolving the core challenge in joint learning through a critical warm-up phase that effectively aligns the representations of the from-scratch LAM with the pre-trained world model. This unlocks a co-evolution cycle: the world model acts as a knowledgeable tutor, providing gradients to shape a high-quality LAM, while the LAM offers a more precise and adaptable control interface to the world model. Empirically, CoLA-World matches or outperforms prior two-stage methods in both video simulation quality and downstream visual planning, establishing a robust and efficient new paradigm for the field.
Related papers
- The Trinity of Consistency as a Defining Principle for General World Models [106.16462830681452]
General World Models are capable of learning, simulating, and reasoning about objective physical laws.<n>We propose a principled theoretical framework that defines the essential properties requisite for a General World Model.<n>Our work establishes a principled pathway toward general world models, clarifying both the limitations of current systems and the architectural requirements for future progress.
arXiv Detail & Related papers (2026-02-26T16:15:55Z) - Walk through Paintings: Egocentric World Models from Internet Priors [65.30611174953958]
We present the Egocentric World Model (EgoWM), a simple, architecture-agnostic method that transforms any pretrained video diffusion model into an action-conditioned world model.<n>Rather than training from scratch, we repurpose the rich world priors of Internet-scale video models and inject motor commands through lightweight conditioning layers.<n>Our approach scales naturally across embodiments and action spaces, ranging from 3-DoF mobile robots to 25-DoF humanoids.
arXiv Detail & Related papers (2026-01-21T18:59:32Z) - ReWorld: Multi-Dimensional Reward Modeling for Embodied World Models [27.729654985554372]
ReWorld is a framework aimed to employ reinforcement learning to align the video-based embodied world models with physical realism, task completion capability, embodiment plausibility and visual quality.<n>We show that ReWorld significantly improves the physical fidelity, logical coherence, embodiment and visual quality of generated rollouts, outperforming previous methods.
arXiv Detail & Related papers (2026-01-18T14:27:10Z) - From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction [57.56072009935036]
We introduce a new driving paradigm named Policy World Model (PWM)<n>PWM integrates world modeling and trajectory planning within a unified architecture.<n>Our method matches or exceeds state-of-the-art approaches that rely on multi-view and multi-modal inputs.
arXiv Detail & Related papers (2025-10-22T14:57:51Z) - Enter the Void - Planning to Seek Entropy When Reward is Scarce [6.208654695856247]
We propose a novel approach that anticipates and actively seeks out high-entropy states using short-horizon latent predictions.<n>We present a hierarchical planner that dynamically decides when to replan, planning horizon length, and the weighting between reward and entropy.<n>Our method finishes the Miniworld procedurally generated mazes 50% faster than base Dreamer at convergence and the policy trained in imagination converges in only 60% of the environment steps that base Dreamer needs.
arXiv Detail & Related papers (2025-05-22T15:28:50Z) - AdaWorld: Learning Adaptable World Models with Latent Actions [76.50869178593733]
We propose AdaWorld, an innovative world model learning approach that enables efficient adaptation.<n>Key idea is to incorporate action information during the pretraining of world models.<n>We then develop an autoregressive world model that conditions on these latent actions.
arXiv Detail & Related papers (2025-03-24T17:58:15Z) - Pre-Trained Video Generative Models as World Simulators [59.546627730477454]
We propose Dynamic World Simulation (DWS) to transform pre-trained video generative models into controllable world simulators.<n>To achieve precise alignment between conditioned actions and generated visual changes, we introduce a lightweight, universal action-conditioned module.<n> Experiments demonstrate that DWS can be versatilely applied to both diffusion and autoregressive transformer models.
arXiv Detail & Related papers (2025-02-10T14:49:09Z) - Masked Generative Priors Improve World Models Sequence Modelling Capabilities [23.48066383072968]
Masked Generative Modelling has emerged as a more efficient and superior inductive bias for modelling.<n>GIT-STORM demonstrates substantial performance gains in RL tasks on the Atari 100k benchmark.<n>We apply Transformer-based World Models to continuous action environments for the first time, addressing a significant gap in prior research.
arXiv Detail & Related papers (2024-10-10T11:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.