Evolutionary Planning in Latent Space
- URL: http://arxiv.org/abs/2011.11293v1
- Date: Mon, 23 Nov 2020 09:21:30 GMT
- Title: Evolutionary Planning in Latent Space
- Authors: Thor V.A.N. Olesen, Dennis T.T. Nguyen, Rasmus Berg Palm, Sebastian
Risi
- Abstract summary: Planning is a powerful approach to reinforcement learning with several desirable properties.
We learn a world model that enables Evolutionary Planning in Latent Space.
We show how to build a model of the world by bootstrapping it with rollouts from a random policy and iteratively refining it with rollouts from an increasingly accurate planning policy.
- Score: 7.863826008567604
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Planning is a powerful approach to reinforcement learning with several
desirable properties. However, it requires a model of the world, which is not
readily available in many real-life problems. In this paper, we propose to
learn a world model that enables Evolutionary Planning in Latent Space (EPLS).
We use a Variational Auto Encoder (VAE) to learn a compressed latent
representation of individual observations and extend a Mixture Density
Recurrent Neural Network (MDRNN) to learn a stochastic, multi-modal forward
model of the world that can be used for planning. We use the Random Mutation
Hill Climbing (RMHC) to find a sequence of actions that maximize expected
reward in this learned model of the world. We demonstrate how to build a model
of the world by bootstrapping it with rollouts from a random policy and
iteratively refining it with rollouts from an increasingly accurate planning
policy using the learned world model. After a few iterations of this
refinement, our planning agents are better than standard model-free
reinforcement learning approaches demonstrating the viability of our approach.
Related papers
- Adaptive World Models: Learning Behaviors by Latent Imagination Under Non-Stationarity [16.15952351162363]
We introduce a new formalism, Hidden.
POMDP, designed for control with adaptive world models.
We demonstrate that this approach enables learning robust behaviors across a variety of non-stationary RL benchmarks.
arXiv Detail & Related papers (2024-11-02T19:09:56Z) - Forecaster: Towards Temporally Abstract Tree-Search Planning from Pixels [42.275164872809746]
We introduce Forecaster, a deep hierarchical reinforcement learning approach which plans over high-level goals.
Forecaster learns an abstract model of its environment by modelling the transitions dynamics at an abstract level.
It then uses this world model to choose optimal high-level goals through a tree-search planning procedure.
arXiv Detail & Related papers (2023-10-16T01:13:26Z) - COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically
for Model-Based RL [50.385005413810084]
Dyna-style model-based reinforcement learning contains two phases: model rollouts to generate sample for policy learning and real environment exploration.
$textttCOPlanner$ is a planning-driven framework for model-based methods to address the inaccurately learned dynamics model problem.
arXiv Detail & Related papers (2023-10-11T06:10:07Z) - Predictive Experience Replay for Continual Visual Control and
Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - Predictive World Models from Real-World Partial Observations [66.80340484148931]
We present a framework for learning a probabilistic predictive world model for real-world road environments.
While prior methods require complete states as ground truth for learning, we present a novel sequential training method to allow HVAEs to learn to predict complete states from partially observed states only.
arXiv Detail & Related papers (2023-01-12T02:07:26Z) - The Effectiveness of World Models for Continual Reinforcement Learning [19.796589322975017]
We study how different selective experience replay methods affect performance, forgetting, and transfer.
Continual-Dreamer is sample efficient and outperforms state-of-the-art task-agnostic continual reinforcement learning methods on Minigrid and Minihack benchmarks.
arXiv Detail & Related papers (2022-11-29T05:56:51Z) - World Model as a Graph: Learning Latent Landmarks for Planning [12.239590266108115]
Planning is a hallmark of human intelligence.
One prominent framework, Model-Based RL, learns a world model and plans using step-by-step virtual rollouts.
We propose to learn graph-structured world models composed of sparse, multi-step transitions.
arXiv Detail & Related papers (2020-11-25T02:49:21Z) - Bridging Imagination and Reality for Model-Based Deep Reinforcement
Learning [72.18725551199842]
We propose a novel model-based reinforcement learning algorithm, called BrIdging Reality and Dream (BIRD)
It maximizes the mutual information between imaginary and real trajectories so that the policy improvement learned from imaginary trajectories can be easily generalized to real trajectories.
We demonstrate that our approach improves sample efficiency of model-based planning, and achieves state-of-the-art performance on challenging visual control benchmarks.
arXiv Detail & Related papers (2020-10-23T03:22:01Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z) - Context-aware Dynamics Model for Generalization in Model-Based
Reinforcement Learning [124.9856253431878]
We decompose the task of learning a global dynamics model into two stages: (a) learning a context latent vector that captures the local dynamics, then (b) predicting the next state conditioned on it.
In order to encode dynamics-specific information into the context latent vector, we introduce a novel loss function that encourages the context latent vector to be useful for predicting both forward and backward dynamics.
The proposed method achieves superior generalization ability across various simulated robotics and control tasks, compared to existing RL schemes.
arXiv Detail & Related papers (2020-05-14T08:10:54Z) - World Programs for Model-Based Learning and Planning in Compositional
State and Action Spaces [4.9023704104715256]
We propose a formalism where the learner induces a world program by learning a dynamics model and the actions in graph-based compositional environments.
We highlight a recent application, and propose a challenge for the community to assess world program-based planning.
arXiv Detail & Related papers (2019-12-30T17:03:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.