Representation Matters: Offline Pretraining for Sequential Decision
Making
- URL: http://arxiv.org/abs/2102.05815v1
- Date: Thu, 11 Feb 2021 02:38:12 GMT
- Title: Representation Matters: Offline Pretraining for Sequential Decision
Making
- Authors: Mengjiao Yang, Ofir Nachum
- Abstract summary: In this paper, we consider a slightly different approach to incorporating offline data into sequential decision-making.
We find that the use of pretraining with unsupervised learning objectives can dramatically improve the performance of policy learning algorithms.
- Score: 27.74988221252854
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The recent success of supervised learning methods on ever larger offline
datasets has spurred interest in the reinforcement learning (RL) field to
investigate whether the same paradigms can be translated to RL algorithms. This
research area, known as offline RL, has largely focused on offline policy
optimization, aiming to find a return-maximizing policy exclusively from
offline data. In this paper, we consider a slightly different approach to
incorporating offline data into sequential decision-making. We aim to answer
the question, what unsupervised objectives applied to offline datasets are able
to learn state representations which elevate performance on downstream tasks,
whether those downstream tasks be online RL, imitation learning from expert
demonstrations, or even offline policy optimization based on the same offline
dataset? Through a variety of experiments utilizing standard offline RL
datasets, we find that the use of pretraining with unsupervised learning
objectives can dramatically improve the performance of policy learning
algorithms that otherwise yield mediocre performance on their own. Extensive
ablations further provide insights into what components of these unsupervised
objectives -- e.g., reward prediction, continuous or discrete representations,
pretraining or finetuning -- are most important and in which settings.
Related papers
- Is Value Learning Really the Main Bottleneck in Offline RL? [70.54708989409409]
We show that the choice of a policy extraction algorithm significantly affects the performance and scalability of offline RL.
We propose two simple test-time policy improvement methods and show that these methods lead to better performance.
arXiv Detail & Related papers (2024-06-13T17:07:49Z) - Offline Reinforcement Learning from Datasets with Structured Non-Stationarity [50.35634234137108]
Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy.
We address a novel Offline RL problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within each episode.
We propose a method based on Contrastive Predictive Coding that identifies this non-stationarity in the offline dataset, accounts for it when training a policy, and predicts it during evaluation.
arXiv Detail & Related papers (2024-05-23T02:41:36Z) - Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced
Datasets [53.8218145723718]
offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data.
We argue that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset.
We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms.
arXiv Detail & Related papers (2023-10-06T17:58:14Z) - Adaptive Policy Learning for Offline-to-Online Reinforcement Learning [27.80266207283246]
We consider an offline-to-online setting where the agent is first learned from the offline dataset and then trained online.
We propose a framework called Adaptive Policy Learning for effectively taking advantage of offline and online data.
arXiv Detail & Related papers (2023-03-14T08:13:21Z) - Benchmarks and Algorithms for Offline Preference-Based Reward Learning [41.676208473752425]
We propose an approach that uses an offline dataset to craft preference queries via pool-based active learning.
Our proposed approach does not require actual physical rollouts or an accurate simulator for either the reward learning or policy optimization steps.
arXiv Detail & Related papers (2023-01-03T23:52:16Z) - Launchpad: Learning to Schedule Using Offline and Online RL Methods [9.488752723308954]
Existing RL schedulers overlook the importance of learning from historical data and improving upon custom policies.
offline reinforcement learning presents the prospect of policy optimization from pre-recorded datasets without online environment interaction.
These methods address the challenges concerning the cost of data collection and safety, particularly pertinent to real-world applications of RL.
arXiv Detail & Related papers (2022-12-01T16:40:11Z) - Don't Change the Algorithm, Change the Data: Exploratory Data for
Offline Reinforcement Learning [147.61075994259807]
We propose Exploratory data for Offline RL (ExORL), a data-centric approach to offline RL.
ExORL first generates data with unsupervised reward-free exploration, then relabels this data with a downstream reward before training a policy with offline RL.
We find that exploratory data allows vanilla off-policy RL algorithms, without any offline-specific modifications, to outperform or match state-of-the-art offline RL algorithms on downstream tasks.
arXiv Detail & Related papers (2022-01-31T18:39:27Z) - Offline Reinforcement Learning with Value-based Episodic Memory [19.12430651038357]
offline reinforcement learning (RL) shows promise of applying RL to real-world problems.
We propose Expectile V-Learning (EVL), which smoothly interpolates between the optimal value learning and behavior cloning.
We present a new offline method called Value-based Episodic Memory (VEM)
arXiv Detail & Related papers (2021-10-19T08:20:11Z) - Behavioral Priors and Dynamics Models: Improving Performance and Domain
Transfer in Offline RL [82.93243616342275]
We introduce Offline Model-based RL with Adaptive Behavioral Priors (MABE)
MABE is based on the finding that dynamics models, which support within-domain generalization, and behavioral priors, which support cross-domain generalization, are complementary.
In experiments that require cross-domain generalization, we find that MABE outperforms prior methods.
arXiv Detail & Related papers (2021-06-16T20:48:49Z) - OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement
Learning [107.6943868812716]
In many practical applications, the situation is reversed: an agent may have access to large amounts of undirected offline experience data, while access to the online environment is severely limited.
Our main insight is that, when presented with offline data composed of a variety of behaviors, an effective way to leverage this data is to extract a continuous space of recurring and temporally extended primitive behaviors.
In addition to benefiting offline policy optimization, we show that performing offline primitive learning in this way can also be leveraged for improving few-shot imitation learning.
arXiv Detail & Related papers (2020-10-26T14:31:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.