D4RL: Datasets for Deep Data-Driven Reinforcement Learning
- URL: http://arxiv.org/abs/2004.07219v4
- Date: Sat, 6 Feb 2021 01:57:28 GMT
- Title: D4RL: Datasets for Deep Data-Driven Reinforcement Learning
- Authors: Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, Sergey Levine
- Abstract summary: We introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL.
By moving beyond simple benchmark tasks and data collected by partially-trained RL agents, we reveal important and unappreciated deficiencies of existing algorithms.
- Score: 119.49182500071288
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The offline reinforcement learning (RL) setting (also known as full batch
RL), where a policy is learned from a static dataset, is compelling as progress
enables RL methods to take advantage of large, previously-collected datasets,
much like how the rise of large datasets has fueled results in supervised
learning. However, existing online RL benchmarks are not tailored towards the
offline setting and existing offline RL benchmarks are restricted to data
generated by partially-trained agents, making progress in offline RL difficult
to measure. In this work, we introduce benchmarks specifically designed for the
offline setting, guided by key properties of datasets relevant to real-world
applications of offline RL. With a focus on dataset collection, examples of
such properties include: datasets generated via hand-designed controllers and
human demonstrators, multitask datasets where an agent performs different tasks
in the same environment, and datasets collected with mixtures of policies. By
moving beyond simple benchmark tasks and data collected by partially-trained RL
agents, we reveal important and unappreciated deficiencies of existing
algorithms. To facilitate research, we have released our benchmark tasks and
datasets with a comprehensive evaluation of existing algorithms, an evaluation
protocol, and open-source examples. This serves as a common starting point for
the community to identify shortcomings in existing offline RL methods and a
collaborative route for progress in this emerging area.
Related papers
- D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning [116.87367592920171]
Offline Reinforcement Learning (RL) has shown promising results in learning a task-specific policy from a fixed dataset.
In scenarios where the dataset for a specific task is limited, a natural approach is to improve offline RL with datasets from other tasks.
We propose an uncertainty-based Multi-Task Data Sharing (MTDS) approach that shares the entire dataset without data selection.
arXiv Detail & Related papers (2024-04-30T08:16:52Z) - Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced
Datasets [53.8218145723718]
offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data.
We argue that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset.
We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms.
arXiv Detail & Related papers (2023-10-06T17:58:14Z) - Look Beneath the Surface: Exploiting Fundamental Symmetry for
Sample-Efficient Offline RL [29.885978495034703]
offline reinforcement learning (RL) offers an appealing approach to real-world tasks by learning policies from pre-collected datasets.
However, the performance of existing offline RL algorithms heavily depends on the scale and state-action space coverage of datasets.
We provide a new insight that leveraging the fundamental symmetry of system dynamics can substantially enhance offline RL performance under small datasets.
arXiv Detail & Related papers (2023-06-07T07:51:05Z) - Don't Change the Algorithm, Change the Data: Exploratory Data for
Offline Reinforcement Learning [147.61075994259807]
We propose Exploratory data for Offline RL (ExORL), a data-centric approach to offline RL.
ExORL first generates data with unsupervised reward-free exploration, then relabels this data with a downstream reward before training a policy with offline RL.
We find that exploratory data allows vanilla off-policy RL algorithms, without any offline-specific modifications, to outperform or match state-of-the-art offline RL algorithms on downstream tasks.
arXiv Detail & Related papers (2022-01-31T18:39:27Z) - The Challenges of Exploration for Offline Reinforcement Learning [8.484491887821473]
We study the two processes of reinforcement learning: collecting informative experience and inferring optimal behaviour.
The task-agnostic setting for data collection, where the task is not known a priori, is of particular interest.
We use this decoupled framework to strengthen intuitions about exploration and the data prerequisites for effective offline RL.
arXiv Detail & Related papers (2022-01-27T23:59:56Z) - Representation Matters: Offline Pretraining for Sequential Decision
Making [27.74988221252854]
In this paper, we consider a slightly different approach to incorporating offline data into sequential decision-making.
We find that the use of pretraining with unsupervised learning objectives can dramatically improve the performance of policy learning algorithms.
arXiv Detail & Related papers (2021-02-11T02:38:12Z) - Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR)
We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.