Know Your Boundaries: The Necessity of Explicit Behavioral Cloning in
Offline RL
- URL: http://arxiv.org/abs/2206.00695v1
- Date: Wed, 1 Jun 2022 18:04:43 GMT
- Title: Know Your Boundaries: The Necessity of Explicit Behavioral Cloning in
Offline RL
- Authors: Wonjoon Goo, Scott Niekum
- Abstract summary: We introduce an offline reinforcement learning algorithm that explicitly clones a behavior policy to constrain value learning.
We show state-of-the-art performance on several datasets within the D4RL and Robomimic benchmarks.
- Score: 28.563015766188478
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce an offline reinforcement learning (RL) algorithm that explicitly
clones a behavior policy to constrain value learning. In offline RL, it is
often important to prevent a policy from selecting unobserved actions, since
the consequence of these actions cannot be presumed without additional
information about the environment. One straightforward way to implement such a
constraint is to explicitly model a given data distribution via behavior
cloning and directly force a policy not to select uncertain actions. However,
many offline RL methods instantiate the constraint indirectly -- for example,
pessimistic value estimation -- due to a concern about errors when modeling a
potentially complex behavior policy. In this work, we argue that it is not only
viable but beneficial to explicitly model the behavior policy for offline RL
because the constraint can be realized in a stable way with the trained model.
We first suggest a theoretical framework that allows us to incorporate
behavior-cloned models into value-based offline RL methods, enjoying the
strength of both explicit behavior cloning and value learning. Then, we propose
a practical method utilizing a score-based generative model for behavior
cloning. With the proposed method, we show state-of-the-art performance on
several datasets within the D4RL and Robomimic benchmarks and achieve
competitive performance across all datasets tested.
Related papers
- Offline RL With Realistic Datasets: Heteroskedasticity and Support
Constraints [82.43359506154117]
We show that typical offline reinforcement learning methods fail to learn from data with non-uniform variability.
Our method is simple, theoretically motivated, and improves performance across a wide range of offline RL problems in Atari games, navigation, and pixel-based manipulation.
arXiv Detail & Related papers (2022-11-02T11:36:06Z) - Boosting Offline Reinforcement Learning via Data Rebalancing [104.3767045977716]
offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets.
We propose a simple yet effective method to boost offline RL algorithms based on the observation that resampling a dataset keeps the distribution support unchanged.
We dub our method ReD (Return-based Data Rebalance), which can be implemented with less than 10 lines of code change and adds negligible running time.
arXiv Detail & Related papers (2022-10-17T16:34:01Z) - A Unified Framework for Alternating Offline Model Training and Policy
Learning [62.19209005400561]
In offline model-based reinforcement learning, we learn a dynamic model from historically collected data, and utilize the learned model and fixed datasets for policy learning.
We develop an iterative offline MBRL framework, where we maximize a lower bound of the true expected return.
With the proposed unified model-policy learning framework, we achieve competitive performance on a wide range of continuous-control offline reinforcement learning datasets.
arXiv Detail & Related papers (2022-10-12T04:58:51Z) - Offline Reinforcement Learning via High-Fidelity Generative Behavior
Modeling [34.88897402357158]
We show that due to the limited distributional expressivity of policy models, previous methods might still select unseen actions during training.
We adopt a generative approach by decoupling the learned policy into two parts: an expressive generative behavior model and an action evaluation model.
Our proposed method achieves competitive or superior performance compared with state-of-the-art offline RL methods.
arXiv Detail & Related papers (2022-09-29T04:36:23Z) - Curriculum Offline Imitation Learning [72.1015201041391]
offline reinforcement learning tasks require the agent to learn from a pre-collected dataset with no further interactions with the environment.
We propose textitCurriculum Offline Learning (COIL), which utilizes an experience picking strategy for imitating from adaptive neighboring policies with a higher return.
On continuous control benchmarks, we compare COIL against both imitation-based and RL-based methods, showing that it not only avoids just learning a mediocre behavior on mixed datasets but is also even competitive with state-of-the-art offline RL methods.
arXiv Detail & Related papers (2021-11-03T08:02:48Z) - The Least Restriction for Offline Reinforcement Learning [0.0]
We propose a creative offline reinforcement learning framework, the Least Restriction (LR)
The LR regards selecting an action as taking a sample from the probability distribution.
It is able to learn robustly from different offline datasets, including random and suboptimal demonstrations.
arXiv Detail & Related papers (2021-07-05T01:50:40Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.