Offline Reinforcement Learning with Fisher Divergence Critic
Regularization
- URL: http://arxiv.org/abs/2103.08050v1
- Date: Sun, 14 Mar 2021 22:11:40 GMT
- Title: Offline Reinforcement Learning with Fisher Divergence Critic
Regularization
- Authors: Ilya Kostrikov, Jonathan Tompson, Rob Fergus, Ofir Nachum
- Abstract summary: We propose an alternative approach to encouraging the learned policy to stay close to the data, namely parameterizing the critic as the log-behavior-policy.
Behavior regularization then corresponds to an appropriate regularizer on the offset term.
Our algorithm Fisher-BRC achieves both improved performance and faster convergence over existing state-of-the-art methods.
- Score: 41.085156836450466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many modern approaches to offline Reinforcement Learning (RL) utilize
behavior regularization, typically augmenting a model-free actor critic
algorithm with a penalty measuring divergence of the policy from the offline
data. In this work, we propose an alternative approach to encouraging the
learned policy to stay close to the data, namely parameterizing the critic as
the log-behavior-policy, which generated the offline data, plus a state-action
value offset term, which can be learned using a neural network. Behavior
regularization then corresponds to an appropriate regularizer on the offset
term. We propose using a gradient penalty regularizer for the offset term and
demonstrate its equivalence to Fisher divergence regularization, suggesting
connections to the score matching and generative energy-based model literature.
We thus term our resulting algorithm Fisher-BRC (Behavior Regularized Critic).
On standard offline RL benchmarks, Fisher-BRC achieves both improved
performance and faster convergence over existing state-of-the-art methods.
Related papers
- Iteratively Refined Behavior Regularization for Offline Reinforcement
Learning [57.10922880400715]
In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration.
By iteratively refining the reference policy used for behavior regularization, conservative policy update guarantees gradually improvement.
Experimental results on the D4RL benchmark indicate that our method outperforms previous state-of-the-art baselines in most tasks.
arXiv Detail & Related papers (2023-06-09T07:46:24Z) - Offline Policy Optimization in RL with Variance Regularizaton [142.87345258222942]
We propose variance regularization for offline RL algorithms, using stationary distribution corrections.
We show that by using Fenchel duality, we can avoid double sampling issues for computing the gradient of the variance regularizer.
The proposed algorithm for offline variance regularization (OVAR) can be used to augment any existing offline policy optimization algorithms.
arXiv Detail & Related papers (2022-12-29T18:25:01Z) - Offline Reinforcement Learning with Adaptive Behavior Regularization [1.491109220586182]
offline reinforcement learning (RL) defines a sample-efficient learning paradigm, where a policy is learned from static and previously collected datasets.
We propose a novel approach, which we refer to as adaptive behavior regularization (ABR)
ABR enables the policy to adaptively adjust its optimization objective between cloning and improving over the policy used to generate the dataset.
arXiv Detail & Related papers (2022-11-15T15:59:11Z) - Boosting Offline Reinforcement Learning via Data Rebalancing [104.3767045977716]
offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets.
We propose a simple yet effective method to boost offline RL algorithms based on the observation that resampling a dataset keeps the distribution support unchanged.
We dub our method ReD (Return-based Data Rebalance), which can be implemented with less than 10 lines of code change and adds negligible running time.
arXiv Detail & Related papers (2022-10-17T16:34:01Z) - Offline Reinforcement Learning with Soft Behavior Regularization [0.8937096931077437]
In this work, we derive a new policy learning objective that can be used in the offline setting.
Unlike state-independent regularization used in prior approaches, this textitsoft regularization allows more freedom of policy deviation.
Our experimental results show that SBAC matches or outperforms the state-of-the-art on a set of continuous control locomotion and manipulation tasks.
arXiv Detail & Related papers (2021-10-14T14:29:44Z) - BRAC+: Improved Behavior Regularized Actor Critic for Offline
Reinforcement Learning [14.432131909590824]
Offline Reinforcement Learning aims to train effective policies using previously collected datasets.
Standard off-policy RL algorithms are prone to overestimations of the values of out-of-distribution (less explored) actions.
We improve the behavior regularized offline reinforcement learning and propose BRAC+.
arXiv Detail & Related papers (2021-10-02T23:55:49Z) - OptiDICE: Offline Policy Optimization via Stationary Distribution
Correction Estimation [59.469401906712555]
We present an offline reinforcement learning algorithm that prevents overestimation in a more principled way.
Our algorithm, OptiDICE, directly estimates the stationary distribution corrections of the optimal policy.
We show that OptiDICE performs competitively with the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-21T00:43:30Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.