Improving and Benchmarking Offline Reinforcement Learning Algorithms
- URL: http://arxiv.org/abs/2306.00972v1
- Date: Thu, 1 Jun 2023 17:58:46 GMT
- Title: Improving and Benchmarking Offline Reinforcement Learning Algorithms
- Authors: Bingyi Kang, Xiao Ma, Yirui Wang, Yang Yue, Shuicheng Yan
- Abstract summary: This work aims to bridge the gaps caused by low-level choices and datasets.
We empirically investigate 20 implementation choices using three representative algorithms.
We find two variants CRR+ and CQL+ achieving new state-of-the-art on D4RL.
- Score: 87.67996706673674
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, Offline Reinforcement Learning (RL) has achieved remarkable
progress with the emergence of various algorithms and datasets. However, these
methods usually focus on algorithmic advancements, ignoring that many low-level
implementation choices considerably influence or even drive the final
performance. As a result, it becomes hard to attribute the progress in Offline
RL as these choices are not sufficiently discussed and aligned in the
literature. In addition, papers focusing on a dataset (e.g., D4RL) often ignore
algorithms proposed on another dataset (e.g., RL Unplugged), causing isolation
among the algorithms, which might slow down the overall progress. Therefore,
this work aims to bridge the gaps caused by low-level choices and datasets. To
this end, we empirically investigate 20 implementation choices using three
representative algorithms (i.e., CQL, CRR, and IQL) and present a guidebook for
choosing implementations. Following the guidebook, we find two variants CRR+
and CQL+ , achieving new state-of-the-art on D4RL. Moreover, we benchmark eight
popular offline RL algorithms across datasets under unified training and
evaluation framework. The findings are inspiring: the success of a learning
paradigm severely depends on the data distribution, and some previous
conclusions are biased by the dataset used. Our code is available at
https://github.com/sail-sg/offbench.
Related papers
- OGBench: Benchmarking Offline Goal-Conditioned RL [72.00291801676684]
offline goal-conditioned reinforcement learning (GCRL) is a major problem in reinforcement learning.
We propose OGBench, a new, high-quality benchmark for algorithms research in offline goal-conditioned RL.
arXiv Detail & Related papers (2024-10-26T06:06:08Z) - Simple Ingredients for Offline Reinforcement Learning [86.1988266277766]
offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task.
We show that existing methods struggle with diverse data: their performance considerably deteriorates as data collected for related but different tasks is simply added to the offline buffer.
We show that scale, more than algorithmic considerations, is the key factor influencing performance.
arXiv Detail & Related papers (2024-03-19T18:57:53Z) - Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced
Datasets [53.8218145723718]
offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data.
We argue that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset.
We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms.
arXiv Detail & Related papers (2023-10-06T17:58:14Z) - Semi-Supervised Offline Reinforcement Learning with Action-Free
Trajectories [37.14064734165109]
Natural agents can learn from multiple data sources that differ in size, quality, and types of measurements.
We study this in the context of offline reinforcement learning (RL) by introducing a new, practically motivated semi-supervised setting.
arXiv Detail & Related papers (2022-10-12T18:22:23Z) - Offline Equilibrium Finding [40.08360411502593]
We aim to generalize Offline RL to a multi-agent or multiplayer-game setting.
Very little research has been done in this area, as the progress is hindered by the lack of standardized datasets and meaningful benchmarks.
Our two model-based algorithms -- OEF-PSRO and OEF-CFR -- are adaptations of the widely-used equilibrium finding algorithms Deep CFR and PSRO in the context of offline learning.
arXiv Detail & Related papers (2022-07-12T03:41:06Z) - Don't Change the Algorithm, Change the Data: Exploratory Data for
Offline Reinforcement Learning [147.61075994259807]
We propose Exploratory data for Offline RL (ExORL), a data-centric approach to offline RL.
ExORL first generates data with unsupervised reward-free exploration, then relabels this data with a downstream reward before training a policy with offline RL.
We find that exploratory data allows vanilla off-policy RL algorithms, without any offline-specific modifications, to outperform or match state-of-the-art offline RL algorithms on downstream tasks.
arXiv Detail & Related papers (2022-01-31T18:39:27Z) - Interpretable performance analysis towards offline reinforcement
learning: A dataset perspective [6.526790418943535]
We propose a two-fold taxonomy for existing offline RL algorithms.
We explore the correlation between the performance of different types of algorithms and the distribution of actions under states.
We create a benchmark platform on the Atari domain, entitled easy go (RLEG), at an estimated cost of more than 0.3 million dollars.
arXiv Detail & Related papers (2021-05-12T07:17:06Z) - RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning [108.9599280270704]
We propose a benchmark called RL Unplugged to evaluate and compare offline RL methods.
RL Unplugged includes data from a diverse range of domains including games and simulated motor control problems.
We will release data for all our tasks and open-source all algorithms presented in this paper.
arXiv Detail & Related papers (2020-06-24T17:14:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.