Datasets and Benchmarks for Offline Safe Reinforcement Learning
- URL: http://arxiv.org/abs/2306.09303v2
- Date: Fri, 16 Jun 2023 17:54:06 GMT
- Title: Datasets and Benchmarks for Offline Safe Reinforcement Learning
- Authors: Zuxin Liu, Zijian Guo, Haohong Lin, Yihang Yao, Jiacheng Zhu, Zhepeng
Cen, Hanjiang Hu, Wenhao Yu, Tingnan Zhang, Jie Tan, Ding Zhao
- Abstract summary: This paper presents a comprehensive benchmarking suite tailored to offline safe reinforcement learning (RL) challenges.
Our benchmark suite contains three packages: 1) expertly crafted safe policies, 2) D4RL-styled datasets along with environment wrappers, and 3) high-quality offline safe RL baseline implementations.
- Score: 22.912420819434516
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a comprehensive benchmarking suite tailored to offline
safe reinforcement learning (RL) challenges, aiming to foster progress in the
development and evaluation of safe learning algorithms in both the training and
deployment phases. Our benchmark suite contains three packages: 1) expertly
crafted safe policies, 2) D4RL-styled datasets along with environment wrappers,
and 3) high-quality offline safe RL baseline implementations. We feature a
methodical data collection pipeline powered by advanced safe RL algorithms,
which facilitates the generation of diverse datasets across 38 popular safe RL
tasks, from robot control to autonomous driving. We further introduce an array
of data post-processing filters, capable of modifying each dataset's diversity,
thereby simulating various data collection conditions. Additionally, we provide
elegant and extensible implementations of prevalent offline safe RL algorithms
to accelerate research in this area. Through extensive experiments with over
50000 CPU and 800 GPU hours of computations, we evaluate and compare the
performance of these baseline algorithms on the collected datasets, offering
insights into their strengths, limitations, and potential areas of improvement.
Our benchmarking framework serves as a valuable resource for researchers and
practitioners, facilitating the development of more robust and reliable offline
safe RL solutions in safety-critical applications. The benchmark website is
available at \url{www.offline-saferl.org}.
Related papers
- D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - AD4RL: Autonomous Driving Benchmarks for Offline Reinforcement Learning with Value-based Dataset [2.66269503676104]
This paper provides autonomous driving datasets and benchmarks for offline reinforcement learning research.
We provide 19 datasets, including real-world human driver's datasets, and seven popular offline reinforcement learning algorithms in three realistic driving scenarios.
arXiv Detail & Related papers (2024-04-03T03:36:35Z) - Offline Goal-Conditioned Reinforcement Learning for Safety-Critical
Tasks with Recovery Policy [4.854443247023496]
offline goal-conditioned reinforcement learning (GCRL) aims at solving goal-reaching tasks with sparse rewards from an offline dataset.
We propose a new method called Recovery-based Supervised Learning (RbSL) to accomplish safety-critical tasks with various goals.
arXiv Detail & Related papers (2024-03-04T05:20:57Z) - Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced
Datasets [53.8218145723718]
offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data.
We argue that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset.
We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms.
arXiv Detail & Related papers (2023-10-06T17:58:14Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR)
We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z) - RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning [108.9599280270704]
We propose a benchmark called RL Unplugged to evaluate and compare offline RL methods.
RL Unplugged includes data from a diverse range of domains including games and simulated motor control problems.
We will release data for all our tasks and open-source all algorithms presented in this paper.
arXiv Detail & Related papers (2020-06-24T17:14:51Z) - D4RL: Datasets for Deep Data-Driven Reinforcement Learning [119.49182500071288]
We introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL.
By moving beyond simple benchmark tasks and data collected by partially-trained RL agents, we reveal important and unappreciated deficiencies of existing algorithms.
arXiv Detail & Related papers (2020-04-15T17:18:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.