OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement
Learning
- URL: http://arxiv.org/abs/2010.13611v3
- Date: Tue, 4 May 2021 19:20:46 GMT
- Title: OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement
Learning
- Authors: Anurag Ajay, Aviral Kumar, Pulkit Agrawal, Sergey Levine, Ofir Nachum
- Abstract summary: In many practical applications, the situation is reversed: an agent may have access to large amounts of undirected offline experience data, while access to the online environment is severely limited.
Our main insight is that, when presented with offline data composed of a variety of behaviors, an effective way to leverage this data is to extract a continuous space of recurring and temporally extended primitive behaviors.
In addition to benefiting offline policy optimization, we show that performing offline primitive learning in this way can also be leveraged for improving few-shot imitation learning.
- Score: 107.6943868812716
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) has achieved impressive performance in a variety
of online settings in which an agent's ability to query the environment for
transitions and rewards is effectively unlimited. However, in many practical
applications, the situation is reversed: an agent may have access to large
amounts of undirected offline experience data, while access to the online
environment is severely limited. In this work, we focus on this offline
setting. Our main insight is that, when presented with offline data composed of
a variety of behaviors, an effective way to leverage this data is to extract a
continuous space of recurring and temporally extended primitive behaviors
before using these primitives for downstream task learning. Primitives
extracted in this way serve two purposes: they delineate the behaviors that are
supported by the data from those that are not, making them useful for avoiding
distributional shift in offline RL; and they provide a degree of temporal
abstraction, which reduces the effective horizon yielding better learning in
theory, and improved offline RL in practice. In addition to benefiting offline
policy optimization, we show that performing offline primitive learning in this
way can also be leveraged for improving few-shot imitation learning as well as
exploration and transfer in online RL on a variety of benchmark domains.
Visualizations are available at https://sites.google.com/view/opal-iclr
Related papers
- Offline Reinforcement Learning from Datasets with Structured Non-Stationarity [50.35634234137108]
Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy.
We address a novel Offline RL problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within each episode.
We propose a method based on Contrastive Predictive Coding that identifies this non-stationarity in the offline dataset, accounts for it when training a policy, and predicts it during evaluation.
arXiv Detail & Related papers (2024-05-23T02:41:36Z) - ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles [52.34951901588738]
We propose a novel framework called ENsemble-based Offline-To-Online (ENOTO) RL.
By increasing the number of Q-networks, we seamlessly bridge offline pre-training and online fine-tuning without degrading performance.
Experimental results demonstrate that ENOTO can substantially improve the training stability, learning efficiency, and final performance of existing offline RL methods.
arXiv Detail & Related papers (2023-06-12T05:10:10Z) - Efficient Online Reinforcement Learning with Offline Data [78.92501185886569]
We show that we can simply apply existing off-policy methods to leverage offline data when learning online.
We extensively ablate these design choices, demonstrating the key factors that most affect performance.
We see that correct application of these simple recommendations can provide a $mathbf2.5times$ improvement over existing approaches.
arXiv Detail & Related papers (2023-02-06T17:30:22Z) - Benchmarks and Algorithms for Offline Preference-Based Reward Learning [41.676208473752425]
We propose an approach that uses an offline dataset to craft preference queries via pool-based active learning.
Our proposed approach does not require actual physical rollouts or an accurate simulator for either the reward learning or policy optimization steps.
arXiv Detail & Related papers (2023-01-03T23:52:16Z) - Flow to Control: Offline Reinforcement Learning with Lossless Primitive
Discovery [31.49638957903016]
offline reinforcement learning (RL) enables the agent to effectively learn from logged data.
We show that our method has a good representation ability for policies and achieves superior performance in most tasks.
arXiv Detail & Related papers (2022-12-02T11:35:51Z) - Adaptive Behavior Cloning Regularization for Stable Offline-to-Online
Reinforcement Learning [80.25648265273155]
Offline reinforcement learning, by learning from a fixed dataset, makes it possible to learn agent behaviors without interacting with the environment.
During online fine-tuning, the performance of the pre-trained agent may collapse quickly due to the sudden distribution shift from offline to online data.
We propose to adaptively weigh the behavior cloning loss during online fine-tuning based on the agent's performance and training stability.
Experiments show that the proposed method yields state-of-the-art offline-to-online reinforcement learning performance on the popular D4RL benchmark.
arXiv Detail & Related papers (2022-10-25T09:08:26Z) - Offline RL With Resource Constrained Online Deployment [13.61540280864938]
offline reinforcement learning is used to train policies in scenarios where real-time access to the environment is expensive or impossible.
This work introduces and formalizes a novel resource-constrained problem setting.
We highlight the performance gap between policies trained using the full offline dataset and policies trained using limited features.
arXiv Detail & Related papers (2021-10-07T03:43:09Z) - Representation Matters: Offline Pretraining for Sequential Decision
Making [27.74988221252854]
In this paper, we consider a slightly different approach to incorporating offline data into sequential decision-making.
We find that the use of pretraining with unsupervised learning objectives can dramatically improve the performance of policy learning algorithms.
arXiv Detail & Related papers (2021-02-11T02:38:12Z) - Offline Reinforcement Learning Hands-On [60.36729294485601]
offline RL aims to turn large datasets into powerful decision-making engines without any online interactions with the environment.
This work aims to reflect upon these efforts from a practitioner viewpoint.
We experimentally validate that diversity and high-return examples in the data are crucial to the success of offline RL.
arXiv Detail & Related papers (2020-11-29T14:45:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.