Dreaming of Many Worlds: Learning Contextual World Models Aids Zero-Shot Generalization
- URL: http://arxiv.org/abs/2403.10967v2
- Date: Sat, 3 Aug 2024 14:25:42 GMT
- Title: Dreaming of Many Worlds: Learning Contextual World Models Aids Zero-Shot Generalization
- Authors: Sai Prasanna, Karim Farid, Raghu Rajan, André Biedenkapp,
- Abstract summary: We address the challenge of zero-shot generalization (ZSG) to unseen dynamics in embodied systems.
We propose the contextual recurrent state-space model (cRSSM), which introduces changes to the world model of Dreamer.
Our experiments show that such systematic incorporation of the context improves the ZSG of the policies trained on the "dreams" of the world model.
- Score: 4.93398395228038
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Zero-shot generalization (ZSG) to unseen dynamics is a major challenge for creating generally capable embodied agents. To address the broader challenge, we start with the simpler setting of contextual reinforcement learning (cRL), assuming observability of the context values that parameterize the variation in the system's dynamics, such as the mass or dimensions of a robot, without making further simplifying assumptions about the observability of the Markovian state. Toward the goal of ZSG to unseen variation in context, we propose the contextual recurrent state-space model (cRSSM), which introduces changes to the world model of Dreamer (v3) (Hafner et al., 2023). This allows the world model to incorporate context for inferring latent Markovian states from the observations and modeling the latent dynamics. Our approach is evaluated on two tasks from the CARL benchmark suite, which is tailored to study contextual RL. Our experiments show that such systematic incorporation of the context improves the ZSG of the policies trained on the "dreams" of the world model. We further find qualitatively that our approach allows Dreamer to disentangle the latent state from context, allowing it to extrapolate its dreams to the many worlds of unseen contexts. The code for all our experiments is available at https://github.com/sai-prasanna/dreaming_of_many_worlds.
Related papers
- Multimodal Dreaming: A Global Workspace Approach to World Model-Based Reinforcement Learning [2.5749046466046903]
In Reinforcement Learning (RL), world models aim to capture how the environment evolves in response to the agent's actions.
We show that performing the dreaming process inside the latent space allows for training with fewer environment steps.
We conclude that the combination of GW with World Models holds great potential for improving decision-making in RL agents.
arXiv Detail & Related papers (2025-02-28T15:24:17Z) - Learning World Models for Unconstrained Goal Navigation [4.549550797148707]
We introduce a goal-directed exploration algorithm, MUN, for learning world models.
MUN is capable of modeling state transitions between arbitrary subgoal states in the replay buffer.
Results demonstrate that MUN strengthens the reliability of world models and significantly improves the policy's capacity to generalize.
arXiv Detail & Related papers (2024-11-03T01:35:06Z) - LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text Environments [70.91258869156353]
We introduce LangSuitE, a versatile and simulation-free testbed featuring 6 representative embodied tasks in textual embodied worlds.
Compared with previous LLM-based testbeds, LangSuitE offers adaptability to diverse environments without multiple simulation engines.
We devise a novel chain-of-thought (CoT) schema, EmMem, which summarizes embodied states w.r.t. history information.
arXiv Detail & Related papers (2024-06-24T03:36:29Z) - Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond [101.15395503285804]
General world models represent a crucial pathway toward achieving Artificial General Intelligence (AGI)
In this survey, we embark on a comprehensive exploration of the latest advancements in world models.
We examine challenges and limitations of world models, and discuss their potential future directions.
arXiv Detail & Related papers (2024-05-06T14:37:07Z) - WorldGPT: Empowering LLM as Multimodal World Model [51.243464216500975]
We introduce WorldGPT, a generalist world model built upon Multimodal Large Language Model (MLLM)
WorldGPT acquires an understanding of world dynamics through analyzing millions of videos across various domains.
We conduct evaluations on WorldNet, a multimodal state transition prediction benchmark.
arXiv Detail & Related papers (2024-04-28T14:42:02Z) - Zero-shot Safety Prediction for Autonomous Robots with Foundation World Models [0.12499537119440243]
A world model creates a surrogate world to train a controller and predict safety violations by learning the internal dynamic model of systems.
We propose foundation world models that embed observations into meaningful and causally latent representations.
This enables the surrogate dynamics to directly predict causal future states by leveraging a training-free large language model.
arXiv Detail & Related papers (2024-03-30T20:03:49Z) - Imagine the Unseen World: A Benchmark for Systematic Generalization in
Visual World Models [21.043565956630957]
We introduce the Systematic Visual Imagination Benchmark (SVIB), the first benchmark designed to address this problem head-on.
SVIB offers a novel framework for a minimal world modeling problem, where models are evaluated based on their ability to generate one-step image-to-image transformations under a latent world dynamics.
We provide a comprehensive evaluation of various baseline models on SVIB, offering insight into the current state-of-the-art in systematic visual imagination.
arXiv Detail & Related papers (2023-11-15T16:02:13Z) - Kosmos-2: Grounding Multimodal Large Language Models to the World [107.27280175398089]
We introduce Kosmos-2, a Multimodal Large Language Model (MLLM)
It enables new capabilities of perceiving object descriptions (e.g., bounding boxes) and grounding text to the visual world.
Code and pretrained models are available at https://aka.ms/kosmos-2.
arXiv Detail & Related papers (2023-06-26T16:32:47Z) - Evolutionary Planning in Latent Space [7.863826008567604]
Planning is a powerful approach to reinforcement learning with several desirable properties.
We learn a world model that enables Evolutionary Planning in Latent Space.
We show how to build a model of the world by bootstrapping it with rollouts from a random policy and iteratively refining it with rollouts from an increasingly accurate planning policy.
arXiv Detail & Related papers (2020-11-23T09:21:30Z) - Context-aware Dynamics Model for Generalization in Model-Based
Reinforcement Learning [124.9856253431878]
We decompose the task of learning a global dynamics model into two stages: (a) learning a context latent vector that captures the local dynamics, then (b) predicting the next state conditioned on it.
In order to encode dynamics-specific information into the context latent vector, we introduce a novel loss function that encourages the context latent vector to be useful for predicting both forward and backward dynamics.
The proposed method achieves superior generalization ability across various simulated robotics and control tasks, compared to existing RL schemes.
arXiv Detail & Related papers (2020-05-14T08:10:54Z) - Emergent Communication with World Models [80.55287578801008]
We introduce Language World Models, a class of language-conditional generative model which interpret natural language messages.
We incorporate this "observation" into a persistent memory state, and allow the listening agent's policy to condition on it.
We show this improves effective communication and task success in 2D gridworld speaker-listener navigation tasks.
arXiv Detail & Related papers (2020-02-22T02:34:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.