Improving Model-Based Reinforcement Learning with Internal State
Representations through Self-Supervision
- URL: http://arxiv.org/abs/2102.05599v1
- Date: Wed, 10 Feb 2021 17:55:04 GMT
- Title: Improving Model-Based Reinforcement Learning with Internal State
Representations through Self-Supervision
- Authors: Julien Scholz, Cornelius Weber, Muhammad Burhan Hafez and Stefan
Wermter
- Abstract summary: Using a model of the environment, reinforcement learning agents can plan their future moves and achieve performance in board games like Chess, Shogi, and Go.
We show that the environment model can even be learned dynamically, generalizing the agent to many more tasks while at the same time achieving state-of-the-art performance.
Our modifications also enable self-supervised pretraining for MuZero, so the algorithm can learn about environment dynamics before a goal is made available.
- Score: 19.37841173522973
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Using a model of the environment, reinforcement learning agents can plan
their future moves and achieve superhuman performance in board games like
Chess, Shogi, and Go, while remaining relatively sample-efficient. As
demonstrated by the MuZero Algorithm, the environment model can even be learned
dynamically, generalizing the agent to many more tasks while at the same time
achieving state-of-the-art performance. Notably, MuZero uses internal state
representations derived from real environment states for its predictions. In
this paper, we bind the model's predicted internal state representation to the
environment state via two additional terms: a reconstruction model loss and a
simpler consistency loss, both of which work independently and unsupervised,
acting as constraints to stabilize the learning process. Our experiments show
that this new integration of reconstruction model loss and simpler consistency
loss provide a significant performance increase in OpenAI Gym environments. Our
modifications also enable self-supervised pretraining for MuZero, so the
algorithm can learn about environment dynamics before a goal is made available.
Related papers
- Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models [60.87795376541144]
A world model is a neural network capable of predicting an agent's next state given past states and actions.
During end-to-end training, our policy learns how to recover from errors by aligning with states observed in human demonstrations.
We present qualitative and quantitative results, demonstrating significant improvements upon prior state of the art in closed-loop testing.
arXiv Detail & Related papers (2024-09-25T06:48:25Z) - ReCoRe: Regularized Contrastive Representation Learning of World Model [21.29132219042405]
We present a world model that learns invariant features using contrastive unsupervised learning and an intervention-invariant regularizer.
Our method outperforms current state-of-the-art model-based and model-free RL methods and significantly improves on out-of-distribution point navigation tasks evaluated on the iGibson benchmark.
arXiv Detail & Related papers (2023-12-14T15:53:07Z) - Model-Based Reinforcement Learning with Isolated Imaginations [61.67183143982074]
We propose Iso-Dream++, a model-based reinforcement learning approach.
We perform policy optimization based on the decoupled latent imaginations.
This enables long-horizon visuomotor control tasks to benefit from isolating mixed dynamics sources in the wild.
arXiv Detail & Related papers (2023-03-27T02:55:56Z) - Concept-modulated model-based offline reinforcement learning for rapid
generalization [5.512991103610139]
We propose a solution that self-generates simulated scenarios constrained by environmental concepts and dynamics learned in an unsupervised manner.
In particular, an internal model of the agent's environment is conditioned on low-dimensional concept representations of the input space that are sensitive to the agent's actions.
We show dramatic improvements in one-shot generalization to different instances of specified failure cases as well as zero-shot generalization to similar variations compared to model-based and model-free approaches.
arXiv Detail & Related papers (2022-09-07T15:06:38Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Dream to Explore: Adaptive Simulations for Autonomous Systems [3.0664963196464448]
We tackle the problem of learning to control dynamical systems by applying Bayesian nonparametric methods.
By employing Gaussian processes to discover latent world dynamics, we mitigate common data efficiency issues observed in reinforcement learning.
Our algorithm jointly learns a world model and policy by optimizing a variational lower bound of a log-likelihood.
arXiv Detail & Related papers (2021-10-27T04:27:28Z) - Online reinforcement learning with sparse rewards through an active
inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future.
Our model is capable of solving sparse-reward problems with a very high sample efficiency.
We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z) - Visualizing MuZero Models [0.23624125155742054]
MuZero, a model-based reinforcement learning algorithm, achieved state-of-the-art performance in Chess, Shogi and the game of Go.
We visualize the latent representation of MuZero agents.
We propose two regularization techniques to stabilize MuZero's performance.
arXiv Detail & Related papers (2021-02-25T15:25:17Z) - Model-Invariant State Abstractions for Model-Based Reinforcement
Learning [54.616645151708994]
We introduce a new type of state abstraction called textitmodel-invariance.
This allows for generalization to novel combinations of unseen values of state variables.
We prove that an optimal policy can be learned over this model-invariance state abstraction.
arXiv Detail & Related papers (2021-02-19T10:37:54Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.