Efficient Online Reinforcement Learning with Offline Data
- URL: http://arxiv.org/abs/2302.02948v4
- Date: Wed, 31 May 2023 10:52:56 GMT
- Title: Efficient Online Reinforcement Learning with Offline Data
- Authors: Philip J. Ball, Laura Smith, Ilya Kostrikov, Sergey Levine
- Abstract summary: We show that we can simply apply existing off-policy methods to leverage offline data when learning online.
We extensively ablate these design choices, demonstrating the key factors that most affect performance.
We see that correct application of these simple recommendations can provide a $mathbf2.5times$ improvement over existing approaches.
- Score: 78.92501185886569
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sample efficiency and exploration remain major challenges in online
reinforcement learning (RL). A powerful approach that can be applied to address
these issues is the inclusion of offline data, such as prior trajectories from
a human expert or a sub-optimal exploration policy. Previous methods have
relied on extensive modifications and additional complexity to ensure the
effective use of this data. Instead, we ask: can we simply apply existing
off-policy methods to leverage offline data when learning online? In this work,
we demonstrate that the answer is yes; however, a set of minimal but important
changes to existing off-policy RL algorithms are required to achieve reliable
performance. We extensively ablate these design choices, demonstrating the key
factors that most affect performance, and arrive at a set of recommendations
that practitioners can readily apply, whether their data comprise a small
number of expert demonstrations or large volumes of sub-optimal trajectories.
We see that correct application of these simple recommendations can provide a
$\mathbf{2.5\times}$ improvement over existing approaches across a diverse set
of competitive benchmarks, with no additional computational overhead. We have
released our code at https://github.com/ikostrikov/rlpd.
Related papers
- Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration [54.8229698058649]
We study how unlabeled prior trajectory data can be leveraged to learn efficient exploration strategies.
Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits.
We empirically show that SUPE reliably outperforms prior strategies, successfully solving a suite of long-horizon, sparse-reward tasks.
arXiv Detail & Related papers (2024-10-23T17:58:45Z) - Offline Reinforcement Learning for Learning to Dispatch for Job Shop Scheduling [0.9831489366502301]
We introduce Offline Reinforcement Learning for Learning to Dispatch (Offline-LD), a novel approach for Job Shop Scheduling Problem (JSSP)
Offline-LD adapts two CQL-based Q-learning methods for maskable action spaces, introduces a new entropy bonus modification for discrete SAC, and exploits reward normalization through preprocessing.
Our experiments show that Offline-LD outperforms online RL on both generated and benchmark instances.
arXiv Detail & Related papers (2024-09-16T15:18:10Z) - Offline Reinforcement Learning from Datasets with Structured Non-Stationarity [50.35634234137108]
Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy.
We address a novel Offline RL problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within each episode.
We propose a method based on Contrastive Predictive Coding that identifies this non-stationarity in the offline dataset, accounts for it when training a policy, and predicts it during evaluation.
arXiv Detail & Related papers (2024-05-23T02:41:36Z) - Benchmarks and Algorithms for Offline Preference-Based Reward Learning [41.676208473752425]
We propose an approach that uses an offline dataset to craft preference queries via pool-based active learning.
Our proposed approach does not require actual physical rollouts or an accurate simulator for either the reward learning or policy optimization steps.
arXiv Detail & Related papers (2023-01-03T23:52:16Z) - Data-Efficient Pipeline for Offline Reinforcement Learning with Limited
Data [28.846826115837825]
offline reinforcement learning can be used to improve future performance by leveraging historical data.
We introduce a task- and method-agnostic pipeline for automatically training, comparing, selecting, and deploying the best policy.
We show it can have substantial impacts when the dataset is small.
arXiv Detail & Related papers (2022-10-16T21:24:53Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement
Learning [107.6943868812716]
In many practical applications, the situation is reversed: an agent may have access to large amounts of undirected offline experience data, while access to the online environment is severely limited.
Our main insight is that, when presented with offline data composed of a variety of behaviors, an effective way to leverage this data is to extract a continuous space of recurring and temporally extended primitive behaviors.
In addition to benefiting offline policy optimization, we show that performing offline primitive learning in this way can also be leveraged for improving few-shot imitation learning.
arXiv Detail & Related papers (2020-10-26T14:31:08Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.