When Should We Prefer Offline Reinforcement Learning Over Behavioral
Cloning?
- URL: http://arxiv.org/abs/2204.05618v1
- Date: Tue, 12 Apr 2022 08:25:34 GMT
- Title: When Should We Prefer Offline Reinforcement Learning Over Behavioral
Cloning?
- Authors: Aviral Kumar, Joey Hong, Anikait Singh, Sergey Levine
- Abstract summary: offline reinforcement learning (RL) algorithms can acquire effective policies by utilizing previously collected experience, without any online interaction.
behavioral cloning (BC) algorithms mimic a subset of the dataset via supervised learning.
We show that policies trained on sufficiently noisy suboptimal data can attain better performance than even BC algorithms with expert data.
- Score: 86.43517734716606
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Offline reinforcement learning (RL) algorithms can acquire effective policies
by utilizing previously collected experience, without any online interaction.
It is widely understood that offline RL is able to extract good policies even
from highly suboptimal data, a scenario where imitation learning finds
suboptimal solutions that do not improve over the demonstrator that generated
the dataset. However, another common use case for practitioners is to learn
from data that resembles demonstrations. In this case, one can choose to apply
offline RL, but can also use behavioral cloning (BC) algorithms, which mimic a
subset of the dataset via supervised learning. Therefore, it seems natural to
ask: when can an offline RL method outperform BC with an equal amount of expert
data, even when BC is a natural choice? To answer this question, we
characterize the properties of environments that allow offline RL methods to
perform better than BC methods, even when only provided with expert data.
Additionally, we show that policies trained on sufficiently noisy suboptimal
data can attain better performance than even BC algorithms with expert data,
especially on long-horizon problems. We validate our theoretical results via
extensive experiments on both diagnostic and high-dimensional domains including
robotic manipulation, maze navigation, and Atari games, with a variety of data
distributions. We observe that, under specific but common conditions such as
sparse rewards or noisy data sources, modern offline RL methods can
significantly outperform BC.
Related papers
- Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced
Datasets [53.8218145723718]
offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data.
We argue that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset.
We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms.
arXiv Detail & Related papers (2023-10-06T17:58:14Z) - Bridging Imitation and Online Reinforcement Learning: An Optimistic Tale [27.02990488317357]
Given an offline demonstration dataset from an imperfect expert, what is the best way to leverage it to bootstrap online learning performance in MDPs?
We first propose an Informed Posterior Sampling-based RL (iPSRL) algorithm that uses the offline dataset, and information about the expert's behavioral policy used to generate the offline dataset.
Since this algorithm is computationally impractical, we then propose the iRLSVI algorithm that can be seen as a combination of the RLSVI algorithm for online RL, and imitation learning.
arXiv Detail & Related papers (2023-03-20T18:16:25Z) - Efficient Online Reinforcement Learning with Offline Data [78.92501185886569]
We show that we can simply apply existing off-policy methods to leverage offline data when learning online.
We extensively ablate these design choices, demonstrating the key factors that most affect performance.
We see that correct application of these simple recommendations can provide a $mathbf2.5times$ improvement over existing approaches.
arXiv Detail & Related papers (2023-02-06T17:30:22Z) - Launchpad: Learning to Schedule Using Offline and Online RL Methods [9.488752723308954]
Existing RL schedulers overlook the importance of learning from historical data and improving upon custom policies.
offline reinforcement learning presents the prospect of policy optimization from pre-recorded datasets without online environment interaction.
These methods address the challenges concerning the cost of data collection and safety, particularly pertinent to real-world applications of RL.
arXiv Detail & Related papers (2022-12-01T16:40:11Z) - Data-Efficient Pipeline for Offline Reinforcement Learning with Limited
Data [28.846826115837825]
offline reinforcement learning can be used to improve future performance by leveraging historical data.
We introduce a task- and method-agnostic pipeline for automatically training, comparing, selecting, and deploying the best policy.
We show it can have substantial impacts when the dataset is small.
arXiv Detail & Related papers (2022-10-16T21:24:53Z) - Discriminator-Weighted Offline Imitation Learning from Suboptimal
Demonstrations [5.760034336327491]
We study the problem of offline Learning (IL) where an agent aims to learn an optimal expert behavior policy without additional online environment interactions.
We introduce an additional discriminator to distinguish expert and non-expert data.
Our proposed algorithm achieves higher returns and faster training speed compared to baseline algorithms.
arXiv Detail & Related papers (2022-07-20T17:29:04Z) - Don't Change the Algorithm, Change the Data: Exploratory Data for
Offline Reinforcement Learning [147.61075994259807]
We propose Exploratory data for Offline RL (ExORL), a data-centric approach to offline RL.
ExORL first generates data with unsupervised reward-free exploration, then relabels this data with a downstream reward before training a policy with offline RL.
We find that exploratory data allows vanilla off-policy RL algorithms, without any offline-specific modifications, to outperform or match state-of-the-art offline RL algorithms on downstream tasks.
arXiv Detail & Related papers (2022-01-31T18:39:27Z) - Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR)
We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.