INFOrmation Prioritization through EmPOWERment in Visual Model-Based RL
- URL: http://arxiv.org/abs/2204.08585v1
- Date: Mon, 18 Apr 2022 23:09:23 GMT
- Title: INFOrmation Prioritization through EmPOWERment in Visual Model-Based RL
- Authors: Homanga Bharadhwaj, Mohammad Babaeizadeh, Dumitru Erhan, Sergey Levine
- Abstract summary: We propose a modified objective for model-based reinforcement learning (RL)
We integrate a term inspired by variational empowerment into a state-space model based on mutual information.
We evaluate the approach on a suite of vision-based robot control tasks with natural video backgrounds.
- Score: 90.06845886194235
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model-based reinforcement learning (RL) algorithms designed for handling
complex visual observations typically learn some sort of latent state
representation, either explicitly or implicitly. Standard methods of this sort
do not distinguish between functionally relevant aspects of the state and
irrelevant distractors, instead aiming to represent all available information
equally. We propose a modified objective for model-based RL that, in
combination with mutual information maximization, allows us to learn
representations and dynamics for visual model-based RL without reconstruction
in a way that explicitly prioritizes functionally relevant factors. The key
principle behind our design is to integrate a term inspired by variational
empowerment into a state-space model based on mutual information. This term
prioritizes information that is correlated with action, thus ensuring that
functionally relevant factors are captured first. Furthermore, the same
empowerment term also promotes faster exploration during the RL process,
especially for sparse-reward tasks where the reward signal is insufficient to
drive exploration in the early stages of learning. We evaluate the approach on
a suite of vision-based robot control tasks with natural video backgrounds, and
show that the proposed prioritized information objective outperforms
state-of-the-art model based RL approaches with higher sample efficiency and
episodic returns. https://sites.google.com/view/information-empowerment
Related papers
- Enhancing data efficiency in reinforcement learning: a novel imagination
mechanism based on mesh information propagation [0.3729614006275886]
We introduce a novel mesh information propagation mechanism, termed the 'Imagination Mechanism (IM)'
IM enables information generated by a single sample to be effectively broadcasted to different states across episodes.
To promote versatility, we extend the IM to function as a plug-and-play module that can be seamlessly and fluidly integrated into other widely adopted RL algorithms.
arXiv Detail & Related papers (2023-09-25T16:03:08Z) - Learning a model is paramount for sample efficiency in reinforcement
learning control of PDEs [5.488334211013093]
We show that learning an actuated model in parallel to training the RL agent significantly reduces the total amount of required data sampled from the real system.
We also show that iteratively updating the model is of major importance to avoid biases in the RL training.
arXiv Detail & Related papers (2023-02-14T16:14:39Z) - Representation Learning in Deep RL via Discrete Information Bottleneck [39.375822469572434]
We study how information bottlenecks can be used to construct latent states efficiently in the presence of task-irrelevant information.
We propose architectures that utilize variational and discrete information bottlenecks, coined as RepDIB, to learn structured factorized representations.
arXiv Detail & Related papers (2022-12-28T14:38:12Z) - Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning.
We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle.
In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z) - Agent-Controller Representations: Principled Offline RL with Rich
Exogenous Information [49.06422815335159]
Learning to control an agent from data collected offline is vital for real-world applications of reinforcement learning (RL)
This paper introduces offline RL benchmarks offering the ability to study this problem.
We find that contemporary representation learning techniques can fail on datasets where the noise is a complex and time dependent process.
arXiv Detail & Related papers (2022-10-31T22:12:48Z) - Simplifying Model-based RL: Learning Representations, Latent-space
Models, and Policies with One Objective [142.36200080384145]
We propose a single objective which jointly optimize a latent-space model and policy to achieve high returns while remaining self-consistent.
We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods.
arXiv Detail & Related papers (2022-09-18T03:51:58Z) - Exploratory State Representation Learning [63.942632088208505]
We propose a new approach called XSRL (eXploratory State Representation Learning) to solve the problems of exploration and SRL in parallel.
On one hand, it jointly learns compact state representations and a state transition estimator which is used to remove unexploitable information from the representations.
On the other hand, it continuously trains an inverse model, and adds to the prediction error of this model a $k$-step learning progress bonus to form the objective of a discovery policy.
arXiv Detail & Related papers (2021-09-28T10:11:07Z) - Offline Reinforcement Learning from Images with Latent Space Models [60.69745540036375]
offline reinforcement learning (RL) refers to the problem of learning policies from a static dataset of environment interactions.
We build on recent advances in model-based algorithms for offline RL, and extend them to high-dimensional visual observation spaces.
Our approach is both tractable in practice and corresponds to maximizing a lower bound of the ELBO in the unknown POMDP.
arXiv Detail & Related papers (2020-12-21T18:28:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.