Related papers: Evolutionary Planning in Latent Space

Evolutionary Planning in Latent Space

URL: http://arxiv.org/abs/2011.11293v1
Date: Mon, 23 Nov 2020 09:21:30 GMT
Title: Evolutionary Planning in Latent Space
Authors: Thor V.A.N. Olesen, Dennis T.T. Nguyen, Rasmus Berg Palm, Sebastian Risi
Abstract summary: Planning is a powerful approach to reinforcement learning with several desirable properties. We learn a world model that enables Evolutionary Planning in Latent Space. We show how to build a model of the world by bootstrapping it with rollouts from a random policy and iteratively refining it with rollouts from an increasingly accurate planning policy.
Score: 7.863826008567604
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Planning is a powerful approach to reinforcement learning with several desirable properties. However, it requires a model of the world, which is not readily available in many real-life problems. In this paper, we propose to learn a world model that enables Evolutionary Planning in Latent Space (EPLS). We use a Variational Auto Encoder (VAE) to learn a compressed latent representation of individual observations and extend a Mixture Density Recurrent Neural Network (MDRNN) to learn a stochastic, multi-modal forward model of the world that can be used for planning. We use the Random Mutation Hill Climbing (RMHC) to find a sequence of actions that maximize expected reward in this learned model of the world. We demonstrate how to build a model of the world by bootstrapping it with rollouts from a random policy and iteratively refining it with rollouts from an increasingly accurate planning policy using the learned world model. After a few iterations of this refinement, our planning agents are better than standard model-free reinforcement learning approaches demonstrating the viability of our approach.

Related papers

AdaWorld: Learning Adaptable World Models with Latent Actions [76.50869178593733]
We propose AdaWorld, an innovative world model learning approach that enables efficient adaptation. Key idea is to incorporate action information during the pretraining of world models. We then develop an autoregressive world model that conditions on these latent actions.
arXiv Detail & Related papers (2025-03-24T17:58:15Z)
Multimodal Dreaming: A Global Workspace Approach to World Model-Based Reinforcement Learning [2.5749046466046903]
In Reinforcement Learning (RL), world models aim to capture how the environment evolves in response to the agent's actions. We show that performing the dreaming process inside the latent space allows for training with fewer environment steps. We conclude that the combination of GW with World Models holds great potential for improving decision-making in RL agents.
arXiv Detail & Related papers (2025-02-28T15:24:17Z)
Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics [50.191655141020505]
This work advances model-based reinforcement learning by addressing the challenges of long-horizon prediction, error accumulation, and sim-to-real transfer. By providing a scalable and robust framework, the introduced methods pave the way for adaptive and efficient robotic systems in real-world applications.
arXiv Detail & Related papers (2025-01-17T10:39:09Z)
Adaptive World Models: Learning Behaviors by Latent Imagination Under Non-Stationarity [16.15952351162363]
We introduce a new formalism, Hidden. POMDP, designed for control with adaptive world models. We demonstrate that this approach enables learning robust behaviors across a variety of non-stationary RL benchmarks.
arXiv Detail & Related papers (2024-11-02T19:09:56Z)
Forecaster: Towards Temporally Abstract Tree-Search Planning from Pixels [42.275164872809746]
We introduce Forecaster, a deep hierarchical reinforcement learning approach which plans over high-level goals. Forecaster learns an abstract model of its environment by modelling the transitions dynamics at an abstract level. It then uses this world model to choose optimal high-level goals through a tree-search planning procedure.
arXiv Detail & Related papers (2023-10-16T01:13:26Z)
COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL [50.385005413810084]
Dyna-style model-based reinforcement learning contains two phases: model rollouts to generate sample for policy learning and real environment exploration. $textttCOPlanner$ is a planning-driven framework for model-based methods to address the inaccurately learned dynamics model problem.
arXiv Detail & Related papers (2023-10-11T06:10:07Z)
Predictive Experience Replay for Continual Visual Control and Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting. We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting. Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z)
Predictive World Models from Real-World Partial Observations [66.80340484148931]
We present a framework for learning a probabilistic predictive world model for real-world road environments. While prior methods require complete states as ground truth for learning, we present a novel sequential training method to allow HVAEs to learn to predict complete states from partially observed states only.
arXiv Detail & Related papers (2023-01-12T02:07:26Z)
The Effectiveness of World Models for Continual Reinforcement Learning [19.796589322975017]
We study how different selective experience replay methods affect performance, forgetting, and transfer. Continual-Dreamer is sample efficient and outperforms state-of-the-art task-agnostic continual reinforcement learning methods on Minigrid and Minihack benchmarks.
arXiv Detail & Related papers (2022-11-29T05:56:51Z)
World Model as a Graph: Learning Latent Landmarks for Planning [12.239590266108115]
Planning is a hallmark of human intelligence. One prominent framework, Model-Based RL, learns a world model and plans using step-by-step virtual rollouts. We propose to learn graph-structured world models composed of sparse, multi-step transitions.
arXiv Detail & Related papers (2020-11-25T02:49:21Z)
Bridging Imagination and Reality for Model-Based Deep Reinforcement Learning [72.18725551199842]
We propose a novel model-based reinforcement learning algorithm, called BrIdging Reality and Dream (BIRD) It maximizes the mutual information between imaginary and real trajectories so that the policy improvement learned from imaginary trajectories can be easily generalized to real trajectories. We demonstrate that our approach improves sample efficiency of model-based planning, and achieves state-of-the-art performance on challenging visual control benchmarks.
arXiv Detail & Related papers (2020-10-23T03:22:01Z)
Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy. We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space. We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z)
Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning [124.9856253431878]
We decompose the task of learning a global dynamics model into two stages: (a) learning a context latent vector that captures the local dynamics, then (b) predicting the next state conditioned on it. In order to encode dynamics-specific information into the context latent vector, we introduce a novel loss function that encourages the context latent vector to be useful for predicting both forward and backward dynamics. The proposed method achieves superior generalization ability across various simulated robotics and control tasks, compared to existing RL schemes.
arXiv Detail & Related papers (2020-05-14T08:10:54Z)
World Programs for Model-Based Learning and Planning in Compositional State and Action Spaces [4.9023704104715256]
We propose a formalism where the learner induces a world program by learning a dynamics model and the actions in graph-based compositional environments. We highlight a recent application, and propose a challenge for the community to assess world program-based planning.
arXiv Detail & Related papers (2019-12-30T17:03:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.