Robust Predictable Control
- URL: http://arxiv.org/abs/2109.03214v1
- Date: Tue, 7 Sep 2021 17:29:34 GMT
- Title: Robust Predictable Control
- Authors: Benjamin Eysenbach, Ruslan Salakhutdinov and Sergey Levine
- Abstract summary: We show that our method achieves much tighter compression than prior methods, achieving up to 5x higher reward than a standard information bottleneck.
We also demonstrate that our method learns policies that are more robust and generalize better to new tasks.
- Score: 149.71263296079388
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many of the challenges facing today's reinforcement learning (RL) algorithms,
such as robustness, generalization, transfer, and computational efficiency are
closely related to compression. Prior work has convincingly argued why
minimizing information is useful in the supervised learning setting, but
standard RL algorithms lack an explicit mechanism for compression. The RL
setting is unique because (1) its sequential nature allows an agent to use past
information to avoid looking at future observations and (2) the agent can
optimize its behavior to prefer states where decision making requires few bits.
We take advantage of these properties to propose a method (RPC) for learning
simple policies. This method brings together ideas from information
bottlenecks, model-based RL, and bits-back coding into a simple and
theoretically-justified algorithm. Our method jointly optimizes a latent-space
model and policy to be self-consistent, such that the policy avoids states
where the model is inaccurate. We demonstrate that our method achieves much
tighter compression than prior methods, achieving up to 5x higher reward than a
standard information bottleneck. We also demonstrate that our method learns
policies that are more robust and generalize better to new tasks.
Related papers
- Iteratively Refined Behavior Regularization for Offline Reinforcement
Learning [57.10922880400715]
In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration.
By iteratively refining the reference policy used for behavior regularization, conservative policy update guarantees gradually improvement.
Experimental results on the D4RL benchmark indicate that our method outperforms previous state-of-the-art baselines in most tasks.
arXiv Detail & Related papers (2023-06-09T07:46:24Z) - Reinforcement Learning with Simple Sequence Priors [9.869634509510016]
We propose an RL algorithm that learns to solve tasks with sequences of actions that are compressible.
We show that the resulting RL algorithm leads to faster learning, and attains higher returns than state-of-the-art model-free approaches.
arXiv Detail & Related papers (2023-05-26T17:18:14Z) - Direct Preference-based Policy Optimization without Reward Modeling [25.230992130108767]
Preference-based reinforcement learning (PbRL) is an approach that enables RL agents to learn from preference.
We propose a PbRL algorithm that directly learns from preference without requiring any reward modeling.
We show that our algorithm surpasses offline RL methods that learn with ground-truth reward information.
arXiv Detail & Related papers (2023-01-30T12:51:13Z) - A Policy Efficient Reduction Approach to Convex Constrained Deep
Reinforcement Learning [2.811714058940267]
We propose a new variant of the conditional gradient (CG) type algorithm, which generalizes the minimum norm point (MNP) method.
Our method reduces the memory costs by an order of magnitude, and achieves better performance, demonstrating both its effectiveness and efficiency.
arXiv Detail & Related papers (2021-08-29T20:51:32Z) - An Information Theory-inspired Strategy for Automatic Network Pruning [88.51235160841377]
Deep convolution neural networks are well known to be compressed on devices with resource constraints.
Most existing network pruning methods require laborious human efforts and prohibitive computation resources.
We propose an information theory-inspired strategy for automatic model compression.
arXiv Detail & Related papers (2021-08-19T07:03:22Z) - Online Sub-Sampling for Reinforcement Learning with General Function
Approximation [111.01990889581243]
In this paper, we establish an efficient online sub-sampling framework that measures the information gain of data points collected by an RL algorithm.
For a value-based method with complexity-bounded function class, we show that the policy only needs to be updated for $proptooperatornamepolylog(K)$ times.
In contrast to existing approaches that update the policy for at least $Omega(K)$ times, our approach drastically reduces the number of optimization calls in solving for a policy.
arXiv Detail & Related papers (2021-06-14T07:36:25Z) - Discovering Reinforcement Learning Algorithms [53.72358280495428]
Reinforcement learning algorithms update an agent's parameters according to one of several possible rules.
This paper introduces a new meta-learning approach that discovers an entire update rule.
It includes both 'what to predict' (e.g. value functions) and 'how to learn from it' by interacting with a set of environments.
arXiv Detail & Related papers (2020-07-17T07:38:39Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.