Inter-environmental world modeling for continuous and compositional dynamics
- URL: http://arxiv.org/abs/2503.09911v1
- Date: Thu, 13 Mar 2025 00:02:54 GMT
- Title: Inter-environmental world modeling for continuous and compositional dynamics
- Authors: Kohei Hayashi, Masanori Koyama, Julian Jorge Andrade Guerreiro,
- Abstract summary: We introduce Lie Action, an unsupervised framework that learns continuous latent action representations to simulate across environments.<n>We demonstrate that WLA can be trained using only video frames and, with minimal or no action labels, can quickly adapt to new environments with novel action sets.
- Score: 7.01176359680407
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Various world model frameworks are being developed today based on autoregressive frameworks that rely on discrete representations of actions and observations, and these frameworks are succeeding in constructing interactive generative models for the target environment of interest. Meanwhile, humans demonstrate remarkable generalization abilities to combine experiences in multiple environments to mentally simulate and learn to control agents in diverse environments. Inspired by this human capability, we introduce World modeling through Lie Action (WLA), an unsupervised framework that learns continuous latent action representations to simulate across environments. WLA learns a control interface with high controllability and predictive ability by simultaneously modeling the dynamics of multiple environments using Lie group theory and object-centric autoencoder. On synthetic benchmark and real-world datasets, we demonstrate that WLA can be trained using only video frames and, with minimal or no action labels, can quickly adapt to new environments with novel action sets.
Related papers
- SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement [81.30121762971473]
SynWorld is a framework that allows agents to autonomously explore environments, optimize, and enhance their understanding of actions.
Our experiments demonstrate that SynWorld is an effective and general approach to learning action knowledge in new environments.
arXiv Detail & Related papers (2025-04-04T16:10:57Z) - Exploration-Driven Generative Interactive Environments [53.05314852577144]
We focus on using many virtual environments for inexpensive, automatically collected interaction data.
We propose a training framework merely using a random agent in virtual environments.
Our agent is fully independent of environment-specific rewards and thus adapts easily to new environments.
arXiv Detail & Related papers (2025-04-03T12:01:41Z) - Gen-C: Populating Virtual Worlds with Generative Crowds [1.5293427903448022]
We introduce Gen-C, a generative model to automate the task of authoring high-level crowd behaviors.
Gen-C bypasses the labor-intensive and challenging task of collecting and annotating real crowd video data.
We demonstrate the effectiveness of our approach in two scenarios, a University Campus and a Train Station.
arXiv Detail & Related papers (2025-04-02T17:33:53Z) - AdaWorld: Learning Adaptable World Models with Latent Actions [76.50869178593733]
We propose AdaWorld, an innovative world model learning approach that enables efficient adaptation.
Key idea is to incorporate action information during the pretraining of world models.
We then develop an autoregressive world model that conditions on these latent actions.
arXiv Detail & Related papers (2025-03-24T17:58:15Z) - Multimodal Dreaming: A Global Workspace Approach to World Model-Based Reinforcement Learning [2.5749046466046903]
In Reinforcement Learning (RL), world models aim to capture how the environment evolves in response to the agent's actions.<n>We show that performing the dreaming process inside the latent space allows for training with fewer environment steps.<n>We conclude that the combination of GW with World Models holds great potential for improving decision-making in RL agents.
arXiv Detail & Related papers (2025-02-28T15:24:17Z) - Pre-Trained Video Generative Models as World Simulators [59.546627730477454]
We propose Dynamic World Simulation (DWS) to transform pre-trained video generative models into controllable world simulators.<n>To achieve precise alignment between conditioned actions and generated visual changes, we introduce a lightweight, universal action-conditioned module.<n> Experiments demonstrate that DWS can be versatilely applied to both diffusion and autoregressive transformer models.
arXiv Detail & Related papers (2025-02-10T14:49:09Z) - One-shot World Models Using a Transformer Trained on a Synthetic Prior [37.027893127637036]
One-Shot World Model (OSWM) is a transformer world model that is learned in an in-context learning fashion from purely synthetic data.
OSWM is able to quickly adapt to the dynamics of a simple grid world, as well as the CartPole gym and a custom control environment.
arXiv Detail & Related papers (2024-09-21T09:39:32Z) - iVideoGPT: Interactive VideoGPTs are Scalable World Models [70.02290687442624]
World models empower model-based agents to interactively explore, reason, and plan within imagined environments for real-world decision-making.
This work introduces Interactive VideoGPT, a scalable autoregressive transformer framework that integrates multimodal signals--visual observations, actions, and rewards--into a sequence of tokens.
iVideoGPT features a novel compressive tokenization technique that efficiently discretizes high-dimensional visual observations.
arXiv Detail & Related papers (2024-05-24T05:29:12Z) - Leveraging World Model Disentanglement in Value-Based Multi-Agent
Reinforcement Learning [18.651307543537655]
We propose a novel model-based multi-agent reinforcement learning approach named Value Decomposition Framework with Disentangled World Model.
We present experimental results in Easy, Hard, and Super-Hard StarCraft II micro-management challenges to demonstrate that our method achieves high sample efficiency and exhibits superior performance in defeating the enemy armies compared to other baselines.
arXiv Detail & Related papers (2023-09-08T22:12:43Z) - Pre-training Contextualized World Models with In-the-wild Videos for
Reinforcement Learning [54.67880602409801]
In this paper, we study the problem of pre-training world models with abundant in-the-wild videos for efficient learning of visual control tasks.
We introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling.
Our experiments show that in-the-wild video pre-training equipped with ContextWM can significantly improve the sample efficiency of model-based reinforcement learning.
arXiv Detail & Related papers (2023-05-29T14:29:12Z) - Predictive Experience Replay for Continual Visual Control and
Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.