Online and Offline Reinforcement Learning by Planning with a Learned
Model
- URL: http://arxiv.org/abs/2104.06294v1
- Date: Tue, 13 Apr 2021 15:36:06 GMT
- Title: Online and Offline Reinforcement Learning by Planning with a Learned
Model
- Authors: Julian Schrittwieser and Thomas Hubert and Amol Mandhane and
Mohammadamin Barekatain and Ioannis Antonoglou and David Silver
- Abstract summary: We describe the Reanalyse algorithm which uses model-based policy and value improvement operators to compute new improved training targets on existing data points.
We show that Reanalyse can also be used to learn entirely from demonstrations without any environment interactions.
We introduce MuZero Unplugged, a single unified algorithm for any data budget, including offline RL.
- Score: 15.8026041700727
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning efficiently from small amounts of data has long been the focus of
model-based reinforcement learning, both for the online case when interacting
with the environment and the offline case when learning from a fixed dataset.
However, to date no single unified algorithm could demonstrate state-of-the-art
results in both settings. In this work, we describe the Reanalyse algorithm
which uses model-based policy and value improvement operators to compute new
improved training targets on existing data points, allowing efficient learning
for data budgets varying by several orders of magnitude. We further show that
Reanalyse can also be used to learn entirely from demonstrations without any
environment interactions, as in the case of offline Reinforcement Learning
(offline RL). Combining Reanalyse with the MuZero algorithm, we introduce
MuZero Unplugged, a single unified algorithm for any data budget, including
offline RL. In contrast to previous work, our algorithm does not require any
special adaptations for the off-policy or offline RL settings. MuZero Unplugged
sets new state-of-the-art results in the RL Unplugged offline RL benchmark as
well as in the online RL benchmark of Atari in the standard 200 million frame
setting.
Related papers
- Finetuning Offline World Models in the Real World [13.46766121896684]
Reinforcement Learning (RL) is notoriously data-inefficient, which makes training on a real robot difficult.
offline RL has been proposed as a framework for training RL policies on pre-existing datasets without any online interaction.
In this work, we consider the problem of pretraining a world model with offline data collected on a real robot, and then finetuning the model on online data collected by planning with the learned model.
arXiv Detail & Related papers (2023-10-24T17:46:12Z) - Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced
Datasets [53.8218145723718]
offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data.
We argue that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset.
We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms.
arXiv Detail & Related papers (2023-10-06T17:58:14Z) - Efficient Offline Policy Optimization with a Learned Model [83.64779942889916]
MuZero Unplugged presents a promising approach for offline policy learning from logged data.
It conducts Monte-Carlo Tree Search (MCTS) with a learned model and leverages Reanalyze algorithm to learn purely from offline data.
This paper investigates a few hypotheses where MuZero Unplugged may not work well under the offline settings.
arXiv Detail & Related papers (2022-10-12T07:41:04Z) - A Unified Framework for Alternating Offline Model Training and Policy
Learning [62.19209005400561]
In offline model-based reinforcement learning, we learn a dynamic model from historically collected data, and utilize the learned model and fixed datasets for policy learning.
We develop an iterative offline MBRL framework, where we maximize a lower bound of the true expected return.
With the proposed unified model-policy learning framework, we achieve competitive performance on a wide range of continuous-control offline reinforcement learning datasets.
arXiv Detail & Related papers (2022-10-12T04:58:51Z) - A Workflow for Offline Model-Free Robotic Reinforcement Learning [117.07743713715291]
offline reinforcement learning (RL) enables learning control policies by utilizing only prior experience, without any online interaction.
We develop a practical workflow for using offline RL analogous to the relatively well-understood for supervised learning problems.
We demonstrate the efficacy of this workflow in producing effective policies without any online tuning.
arXiv Detail & Related papers (2021-09-22T16:03:29Z) - A Minimalist Approach to Offline Reinforcement Learning [10.904148149681932]
offline reinforcement learning defines the task of learning from a fixed batch of data.
In this paper we aim to make a deep RL algorithm work while making minimal changes.
We find that we can match the performance of state-of-the-art offline RL algorithms by simply adding a behavior cloning term to the policy update of an online RL algorithm.
arXiv Detail & Related papers (2021-06-12T20:38:59Z) - Representation Matters: Offline Pretraining for Sequential Decision
Making [27.74988221252854]
In this paper, we consider a slightly different approach to incorporating offline data into sequential decision-making.
We find that the use of pretraining with unsupervised learning objectives can dramatically improve the performance of policy learning algorithms.
arXiv Detail & Related papers (2021-02-11T02:38:12Z) - RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning [108.9599280270704]
We propose a benchmark called RL Unplugged to evaluate and compare offline RL methods.
RL Unplugged includes data from a diverse range of domains including games and simulated motor control problems.
We will release data for all our tasks and open-source all algorithms presented in this paper.
arXiv Detail & Related papers (2020-06-24T17:14:51Z) - D4RL: Datasets for Deep Data-Driven Reinforcement Learning [119.49182500071288]
We introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL.
By moving beyond simple benchmark tasks and data collected by partially-trained RL agents, we reveal important and unappreciated deficiencies of existing algorithms.
arXiv Detail & Related papers (2020-04-15T17:18:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.