GenSafe: A Generalizable Safety Enhancer for Safe Reinforcement Learning Algorithms Based on Reduced Order Markov Decision Process Model
- URL: http://arxiv.org/abs/2406.03912v1
- Date: Thu, 6 Jun 2024 09:51:30 GMT
- Title: GenSafe: A Generalizable Safety Enhancer for Safe Reinforcement Learning Algorithms Based on Reduced Order Markov Decision Process Model
- Authors: Zhehua Zhou, Xuan Xie, Jiayang Song, Zhan Shu, Lei Ma,
- Abstract summary: We introduce a Genizable Safety enhancer (GenSafe) for Safe Reinforcement Learning (SRL) algorithms.
By solving ROMDP-based constraints that are reformulated from the original cost constraints, GenSafe refines the actions taken by the agent to enhance the possibility of constraint satisfaction.
The results show that, it is not only able to improve the safety performance, especially in the early learning phases, but also to maintain the task performance at a satisfactory level.
- Score: 8.915288771953545
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Although deep reinforcement learning has demonstrated impressive achievements in controlling various autonomous systems, e.g., autonomous vehicles or humanoid robots, its inherent reliance on random exploration raises safety concerns in their real-world applications. To improve system safety during the learning process, a variety of Safe Reinforcement Learning (SRL) algorithms have been proposed, which usually incorporate safety constraints within the Constrained Markov Decision Process (CMDP) framework. However, the efficacy of these SRL algorithms often relies on accurate function approximations, a task that is notably challenging to accomplish in the early learning stages due to data insufficiency. To address this problem, we introduce a Genralizable Safety enhancer (GenSafe) in this work. Leveraging model order reduction techniques, we first construct a Reduced Order Markov Decision Process (ROMDP) as a low-dimensional proxy for the original cost function in CMDP. Then, by solving ROMDP-based constraints that are reformulated from the original cost constraints, the proposed GenSafe refines the actions taken by the agent to enhance the possibility of constraint satisfaction. Essentially, GenSafe acts as an additional safety layer for SRL algorithms, offering broad compatibility across diverse SRL approaches. The performance of GenSafe is examined on multiple SRL benchmark problems. The results show that, it is not only able to improve the safety performance, especially in the early learning phases, but also to maintain the task performance at a satisfactory level.
Related papers
- Safety through Permissibility: Shield Construction for Fast and Safe Reinforcement Learning [57.84059344739159]
"Shielding" is a popular technique to enforce safety inReinforcement Learning (RL)
We propose a new permissibility-based framework to deal with safety and shield construction.
arXiv Detail & Related papers (2024-05-29T18:00:21Z) - Safe Exploration in Reinforcement Learning: A Generalized Formulation
and Algorithms [8.789204441461678]
We present a solution of the safe exploration (GSE) problem in the form of a meta-algorithm for safe exploration, MASE.
Our proposed algorithm achieves better performance than state-of-the-art algorithms on grid-world and Safety Gym benchmarks.
arXiv Detail & Related papers (2023-10-05T00:47:09Z) - Approximate Model-Based Shielding for Safe Reinforcement Learning [83.55437924143615]
We propose a principled look-ahead shielding algorithm for verifying the performance of learned RL policies.
Our algorithm differs from other shielding approaches in that it does not require prior knowledge of the safety-relevant dynamics of the system.
We demonstrate superior performance to other safety-aware approaches on a set of Atari games with state-dependent safety-labels.
arXiv Detail & Related papers (2023-07-27T15:19:45Z) - Safe and Sample-efficient Reinforcement Learning for Clustered Dynamic
Environments [4.111899441919165]
This study proposes a safe and sample-efficient reinforcement learning (RL) framework to address two major challenges.
We use the safe set algorithm (SSA) to monitor and modify the nominal controls, and evaluate SSA+RL in a clustered dynamic environment.
Our framework can achieve better safety performance compare to other safe RL methods during training and solve the task with substantially fewer episodes.
arXiv Detail & Related papers (2023-03-24T20:29:17Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Safe reinforcement learning for multi-energy management systems with
known constraint functions [0.0]
Reinforcement learning (RL) is a promising optimal control technique for multi-energy management systems.
We present two novel safe RL methods, namely SafeFallback and GiveSafe.
In a simulated multi-energy systems case study we have shown that both methods start with a significantly higher utility.
arXiv Detail & Related papers (2022-07-08T11:33:53Z) - SafeRL-Kit: Evaluating Efficient Reinforcement Learning Methods for Safe
Autonomous Driving [12.925039760573092]
We release SafeRL-Kit to benchmark safe RL methods for autonomous driving tasks.
SafeRL-Kit contains several latest algorithms specific to zero-constraint-violation tasks, including Safety Layer, Recovery RL, off-policy Lagrangian method, and Feasible Actor-Critic.
We conduct a comparative evaluation of the above algorithms in SafeRL-Kit and shed light on their efficacy for safe autonomous driving.
arXiv Detail & Related papers (2022-06-17T03:23:51Z) - Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and
Benchmarking [12.719948223824483]
reinforcement learning (RL) algorithms are crucial to unlock their potential for many real-world tasks.
However, vanilla RL and most safe RL approaches do not guarantee safety.
We introduce a categorization of existing provably safe RL methods, present the conceptual foundations for both continuous and discrete action spaces, and empirically benchmark existing methods.
We provide practical guidance on selecting provably safe RL approaches depending on the safety specification, RL algorithm, and type of action space.
arXiv Detail & Related papers (2022-05-13T16:34:36Z) - Learning to be Safe: Deep RL with a Safety Critic [72.00568333130391]
A natural first approach toward safe RL is to manually specify constraints on the policy's behavior.
We propose to learn how to be safe in one set of tasks and environments, and then use that learned intuition to constrain future behaviors.
arXiv Detail & Related papers (2020-10-27T20:53:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.