A Connection between One-Step Regularization and Critic Regularization
in Reinforcement Learning
- URL: http://arxiv.org/abs/2307.12968v1
- Date: Mon, 24 Jul 2023 17:46:32 GMT
- Title: A Connection between One-Step Regularization and Critic Regularization
in Reinforcement Learning
- Authors: Benjamin Eysenbach, Matthieu Geist, Sergey Levine, Ruslan
Salakhutdinov
- Abstract summary: One-step methods perform regularization by doing just a single step of policy improvement.
critic regularization methods do many steps of policy improvement with a regularized objective.
Applying a multi-step critic regularization method with a regularization coefficient of 1 iteration yields the same policy as one-step RL.
- Score: 163.44116192806922
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As with any machine learning problem with limited data, effective offline RL
algorithms require careful regularization to avoid overfitting. One-step
methods perform regularization by doing just a single step of policy
improvement, while critic regularization methods do many steps of policy
improvement with a regularized objective. These methods appear distinct.
One-step methods, such as advantage-weighted regression and conditional
behavioral cloning, truncate policy iteration after just one step. This ``early
stopping'' makes one-step RL simple and stable, but can limit its asymptotic
performance. Critic regularization typically requires more compute but has
appealing lower-bound guarantees. In this paper, we draw a close connection
between these methods: applying a multi-step critic regularization method with
a regularization coefficient of 1 yields the same policy as one-step RL. While
practical implementations violate our assumptions and critic regularization is
typically applied with smaller regularization coefficients, our experiments
nevertheless show that our analysis makes accurate, testable predictions about
practical offline RL methods (CQL and one-step RL) with commonly-used
hyperparameters. Our results that every problem can be solved with a single
step of policy improvement, but rather that one-step RL might be competitive
with critic regularization on RL problems that demand strong regularization.
Related papers
- Hundreds Guide Millions: Adaptive Offline Reinforcement Learning with
Expert Guidance [74.31779732754697]
We propose a novel plug-in approach named Guided Offline RL (GORL)
GORL employs a guiding network, along with only a few expert demonstrations, to adaptively determine the relative importance of the policy improvement and policy constraint for every sample.
Experiments on various environments suggest that GORL can be easily installed on most offline RL algorithms with statistically significant performance improvements.
arXiv Detail & Related papers (2023-09-04T08:59:04Z) - Iteratively Refined Behavior Regularization for Offline Reinforcement
Learning [57.10922880400715]
In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration.
By iteratively refining the reference policy used for behavior regularization, conservative policy update guarantees gradually improvement.
Experimental results on the D4RL benchmark indicate that our method outperforms previous state-of-the-art baselines in most tasks.
arXiv Detail & Related papers (2023-06-09T07:46:24Z) - ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for
Last-Iterate Convergence in Constrained MDPs [31.663072540757643]
Reinforcement Learning (RL) has been applied to real-world problems with increasing success.
We introduce Reinforcement Learning with Optimistic Ascent-Descent (ReLOAD)
arXiv Detail & Related papers (2023-02-02T18:05:27Z) - Offline Policy Optimization in RL with Variance Regularizaton [142.87345258222942]
We propose variance regularization for offline RL algorithms, using stationary distribution corrections.
We show that by using Fenchel duality, we can avoid double sampling issues for computing the gradient of the variance regularizer.
The proposed algorithm for offline variance regularization (OVAR) can be used to augment any existing offline policy optimization algorithms.
arXiv Detail & Related papers (2022-12-29T18:25:01Z) - Optimal Conservative Offline RL with General Function Approximation via
Augmented Lagrangian [18.2080757218886]
offline reinforcement learning (RL) refers to decision-making from a previously-collected dataset of interactions.
We present the first set of offline RL algorithms that are statistically optimal and practical under general function approximation and single-policy concentrability.
arXiv Detail & Related papers (2022-11-01T19:28:48Z) - Instance-Dependent Confidence and Early Stopping for Reinforcement
Learning [99.57168572237421]
Various algorithms for reinforcement learning (RL) exhibit dramatic variation in their convergence rates as a function of problem structure.
This research provides guarantees that explain textitex post the performance differences observed.
A natural next step is to convert these theoretical guarantees into guidelines that are useful in practice.
arXiv Detail & Related papers (2022-01-21T04:25:35Z) - A Policy Efficient Reduction Approach to Convex Constrained Deep
Reinforcement Learning [2.811714058940267]
We propose a new variant of the conditional gradient (CG) type algorithm, which generalizes the minimum norm point (MNP) method.
Our method reduces the memory costs by an order of magnitude, and achieves better performance, demonstrating both its effectiveness and efficiency.
arXiv Detail & Related papers (2021-08-29T20:51:32Z) - Offline RL Without Off-Policy Evaluation [49.11859771578969]
We show that simply doing one step of constrained/regularized policy improvement using an on-policy Q estimate of the behavior policy performs surprisingly well.
This one-step algorithm beats the previously reported results of iterative algorithms on a large portion of the D4RL benchmark.
arXiv Detail & Related papers (2021-06-16T16:04:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.