Predictive Information Accelerates Learning in RL
- URL: http://arxiv.org/abs/2007.12401v2
- Date: Mon, 26 Oct 2020 00:27:00 GMT
- Title: Predictive Information Accelerates Learning in RL
- Authors: Kuang-Huei Lee, Ian Fischer, Anthony Liu, Yijie Guo, Honglak Lee, John
Canny, Sergio Guadarrama
- Abstract summary: We train Soft Actor-Critic (SAC) agents from pixels with an auxiliary task that learns a compressed representation of the predictive information of the RL environment dynamics.
We show that PI-SAC agents can substantially improve sample efficiency over challenging baselines on tasks from the DM Control suite of continuous control environments.
- Score: 50.52439807008805
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Predictive Information is the mutual information between the past and the
future, I(X_past; X_future). We hypothesize that capturing the predictive
information is useful in RL, since the ability to model what will happen next
is necessary for success on many tasks. To test our hypothesis, we train Soft
Actor-Critic (SAC) agents from pixels with an auxiliary task that learns a
compressed representation of the predictive information of the RL environment
dynamics using a contrastive version of the Conditional Entropy Bottleneck
(CEB) objective. We refer to these as Predictive Information SAC (PI-SAC)
agents. We show that PI-SAC agents can substantially improve sample efficiency
over challenging baselines on tasks from the DM Control suite of continuous
control environments. We evaluate PI-SAC agents by comparing against
uncompressed PI-SAC agents, other compressed and uncompressed agents, and SAC
agents directly trained from pixels. Our implementation is given on GitHub.
Related papers
- A Zero-Shot approach to the Conversational Tree Search Task [28.392036110582723]
Conversational Tree Search (CTS) provides a graph-based framework for controllable task-oriented dialog in sensitive domains.
The goal of this paper is to eliminate the need for training CTS agents altogether.
We show that zero-shot, controllable CTS agents significantly outperform state-of-the-art CTS agents in simulation.
arXiv Detail & Related papers (2024-10-08T08:51:44Z) - RLSAC: Reinforcement Learning enhanced Sample Consensus for End-to-End
Robust Estimation [74.47709320443998]
We propose RLSAC, a novel Reinforcement Learning enhanced SAmple Consensus framework for end-to-end robust estimation.
RLSAC employs a graph neural network to utilize both data and memory features to guide exploring directions for sampling the next minimum set.
Our experimental results demonstrate that RLSAC can learn from features to gradually explore a better hypothesis.
arXiv Detail & Related papers (2023-08-10T03:14:19Z) - Unsupervised Dense Retrieval with Relevance-Aware Contrastive
Pre-Training [81.3781338418574]
We propose relevance-aware contrastive learning.
We consistently improve the SOTA unsupervised Contriever model on the BEIR and open-domain QA retrieval benchmarks.
Our method can not only beat BM25 after further pre-training on the target corpus but also serves as a good few-shot learner.
arXiv Detail & Related papers (2023-06-05T18:20:27Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Frustratingly Easy Regularization on Representation Can Boost Deep
Reinforcement Learning [9.072416458330268]
In this work, we demonstrate that the learned representation of the $Q$-network and its target $Q$-network should, in theory, satisfy a favorable distinguishable representation property.
We propose Policy Evaluation with Easy Regularization on Representation (PEER), which aims to maintain the distinguishable representation property via explicit regularization on internal representations.
PEER achieves state-of-the-art performance on all 4 environments on PyBullet, 9 out of 12 tasks on DMControl, and 19 out of 26 games on Atari.
arXiv Detail & Related papers (2022-05-29T02:29:32Z) - Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior.
The retrieval process is trained to retrieve information from the dataset that may be useful in the current context.
We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z) - Improved Soft Actor-Critic: Mixing Prioritized Off-Policy Samples with
On-Policy Experience [9.06635747612495]
Soft Actor-Critic (SAC) is an off-policy actor-critic reinforcement learning algorithm.
SAC trains a policy by maximizing the trade-off between expected return and entropy.
It has achieved state-of-the-art performance on a range of continuous-control benchmark tasks.
arXiv Detail & Related papers (2021-09-24T06:46:28Z) - Robust Predictable Control [149.71263296079388]
We show that our method achieves much tighter compression than prior methods, achieving up to 5x higher reward than a standard information bottleneck.
We also demonstrate that our method learns policies that are more robust and generalize better to new tasks.
arXiv Detail & Related papers (2021-09-07T17:29:34Z) - Automatic Data Augmentation for Generalization in Deep Reinforcement
Learning [39.477038093585726]
Deep reinforcement learning (RL) agents often fail to generalize to unseen scenarios.
Data augmentation has recently been shown to improve the sample efficiency and generalization of RL agents.
We show that our agent learns policies and representations that are more robust to changes in the environment that do not affect the agent.
arXiv Detail & Related papers (2020-06-23T09:50:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.