A Review of Safe Reinforcement Learning: Methods, Theory and Applications
- URL: http://arxiv.org/abs/2205.10330v5
- Date: Fri, 24 May 2024 22:33:04 GMT
- Title: A Review of Safe Reinforcement Learning: Methods, Theory and Applications
- Authors: Shangding Gu, Long Yang, Yali Du, Guang Chen, Florian Walter, Jun Wang, Alois Knoll,
- Abstract summary: We provide a review of safe RL from the perspectives of methods, theories, and applications.
We come up with five crucial problems for safe RL being deployed in real-world applications, coined as "2H3W"
- Score: 15.450066275233008
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement Learning (RL) has achieved tremendous success in many complex decision-making tasks. However, safety concerns are raised during deploying RL in real-world applications, leading to a growing demand for safe RL algorithms, such as in autonomous driving and robotics scenarios. While safe control has a long history, the study of safe RL algorithms is still in the early stages. To establish a good foundation for future safe RL research, in this paper, we provide a review of safe RL from the perspectives of methods, theories, and applications. Firstly, we review the progress of safe RL from five dimensions and come up with five crucial problems for safe RL being deployed in real-world applications, coined as "2H3W". Secondly, we analyze the algorithm and theory progress from the perspectives of answering the "2H3W" problems. Particularly, the sample complexity of safe RL algorithms is reviewed and discussed, followed by an introduction to the applications and benchmarks of safe RL algorithms. Finally, we open the discussion of the challenging problems in safe RL, hoping to inspire future research on this thread. To advance the study of safe RL algorithms, we release an open-sourced repository containing the implementations of major safe RL algorithms at the link: https://github.com/chauncygu/Safe-Reinforcement-Learning-Baselines.git.
Related papers
- Long-term Safe Reinforcement Learning with Binary Feedback [5.684409853507594]
Long-term Binary Safe RL (LoBiSaRL) is a safe RL algorithm for constrained Markov decision processes.
LoBiSaRL guarantees the long-term safety constraint, with high probability.
Our theoretical results show that LoBiSaRL guarantees the long-term safety constraint, with high probability.
arXiv Detail & Related papers (2024-01-08T10:07:31Z) - Guided Online Distillation: Promoting Safe Reinforcement Learning by
Offline Demonstration [75.51109230296568]
We argue that extracting expert policy from offline data to guide online exploration is a promising solution to mitigate the conserveness issue.
We propose Guided Online Distillation (GOLD), an offline-to-online safe RL framework.
GOLD distills an offline DT policy into a lightweight policy network through guided online safe RL training, which outperforms both the offline DT policy and online safe RL algorithms.
arXiv Detail & Related papers (2023-09-18T00:22:59Z) - GUARD: A Safe Reinforcement Learning Benchmark [11.887626936994883]
Generalized Unified SAfe Reinforcement Learning Development Benchmark.
GUARD is a generalized benchmark with a wide variety of RL agents, tasks, and safety constraint specifications.
We present a comparison of state-of-the-art safe RL algorithms in various task settings using GUARD and establish baselines that future work can build on.
arXiv Detail & Related papers (2023-05-23T04:40:29Z) - OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning
Research [3.0536277689386453]
We introduce a foundational framework designed to expedite SafeRL research endeavors.
Our framework encompasses an array of algorithms spanning different RL domains and places heavy emphasis on safety elements.
arXiv Detail & Related papers (2023-05-16T09:22:14Z) - Safe and Sample-efficient Reinforcement Learning for Clustered Dynamic
Environments [4.111899441919165]
This study proposes a safe and sample-efficient reinforcement learning (RL) framework to address two major challenges.
We use the safe set algorithm (SSA) to monitor and modify the nominal controls, and evaluate SSA+RL in a clustered dynamic environment.
Our framework can achieve better safety performance compare to other safe RL methods during training and solve the task with substantially fewer episodes.
arXiv Detail & Related papers (2023-03-24T20:29:17Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Provable Safe Reinforcement Learning with Binary Feedback [62.257383728544006]
We consider the problem of provable safe RL when given access to an offline oracle providing binary feedback on the safety of state, action pairs.
We provide a novel meta algorithm, SABRE, which can be applied to any MDP setting given access to a blackbox PAC RL algorithm for that setting.
arXiv Detail & Related papers (2022-10-26T05:37:51Z) - SafeRL-Kit: Evaluating Efficient Reinforcement Learning Methods for Safe
Autonomous Driving [12.925039760573092]
We release SafeRL-Kit to benchmark safe RL methods for autonomous driving tasks.
SafeRL-Kit contains several latest algorithms specific to zero-constraint-violation tasks, including Safety Layer, Recovery RL, off-policy Lagrangian method, and Feasible Actor-Critic.
We conduct a comparative evaluation of the above algorithms in SafeRL-Kit and shed light on their efficacy for safe autonomous driving.
arXiv Detail & Related papers (2022-06-17T03:23:51Z) - SAUTE RL: Almost Surely Safe Reinforcement Learning Using State
Augmentation [63.25418599322092]
Satisfying safety constraints almost surely (or with probability one) can be critical for deployment of Reinforcement Learning (RL) in real-life applications.
We address the problem by introducing Safety Augmented Markov Decision Processes (MDPs)
We show that Saute MDP allows to view Safe augmentation problem from a different perspective enabling new features.
arXiv Detail & Related papers (2022-02-14T08:57:01Z) - Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones [81.49106778460238]
Recovery RL uses offline data to learn about constraint violating zones before policy learning.
We evaluate Recovery RL on 6 simulation domains, including two contact-rich manipulation tasks and an image-based navigation task.
Results suggest that Recovery RL trades off constraint violations and task successes 2 - 20 times more efficiently in simulation domains and 3 times more efficiently in physical experiments.
arXiv Detail & Related papers (2020-10-29T20:10:02Z) - Learning to be Safe: Deep RL with a Safety Critic [72.00568333130391]
A natural first approach toward safe RL is to manually specify constraints on the policy's behavior.
We propose to learn how to be safe in one set of tasks and environments, and then use that learned intuition to constrain future behaviors.
arXiv Detail & Related papers (2020-10-27T20:53:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.