Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks
- URL: http://arxiv.org/abs/2212.05727v1
- Date: Mon, 12 Dec 2022 06:30:17 GMT
- Title: Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks
- Authors: Linrui Zhang and Qin Zhang and Li Shen and Bo Yuan and Xueqian Wang
and Dacheng Tao
- Abstract summary: This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
- Score: 70.76757529955577
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Safety comes first in many real-world applications involving autonomous
agents. Despite a large number of reinforcement learning (RL) methods focusing
on safety-critical tasks, there is still a lack of high-quality evaluation of
those algorithms that adheres to safety constraints at each decision step under
complex and unknown dynamics. In this paper, we revisit prior work in this
scope from the perspective of state-wise safe RL and categorize them as
projection-based, recovery-based, and optimization-based approaches,
respectively. Furthermore, we propose Unrolling Safety Layer (USL), a joint
method that combines safety optimization and safety projection. This novel
technique explicitly enforces hard constraints via the deep unrolling
architecture and enjoys structural advantages in navigating the trade-off
between reward improvement and constraint satisfaction. To facilitate further
research in this area, we reproduce related algorithms in a unified pipeline
and incorporate them into SafeRL-Kit, a toolkit that provides off-the-shelf
interfaces and evaluation utilities for safety-critical tasks. We then perform
a comparative study of the involved algorithms on six benchmarks ranging from
robotic control to autonomous driving. The empirical results provide an insight
into their applicability and robustness in learning zero-cost-return policies
without task-dependent handcrafting. The project page is available at
https://sites.google.com/view/saferlkit.
Related papers
- GenSafe: A Generalizable Safety Enhancer for Safe Reinforcement Learning Algorithms Based on Reduced Order Markov Decision Process Model [8.915288771953545]
We introduce a Genizable Safety enhancer (GenSafe) for Safe Reinforcement Learning (SRL) algorithms.
By solving ROMDP-based constraints that are reformulated from the original cost constraints, GenSafe refines the actions taken by the agent to enhance the possibility of constraint satisfaction.
The results show that, it is not only able to improve the safety performance, especially in the early learning phases, but also to maintain the task performance at a satisfactory level.
arXiv Detail & Related papers (2024-06-06T09:51:30Z) - Safeguarded Progress in Reinforcement Learning: Safe Bayesian
Exploration for Control Policy Synthesis [63.532413807686524]
This paper addresses the problem of maintaining safety during training in Reinforcement Learning (RL)
We propose a new architecture that handles the trade-off between efficient progress and safety during exploration.
arXiv Detail & Related papers (2023-12-18T16:09:43Z) - SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization [1.3597551064547502]
This study introduces a novel safe reinforcement learning algorithm, Safety Critic Policy Optimization.
In this study, we define the safety critic, a mechanism that nullifies rewards obtained through violating safety constraints.
Our theoretical analysis indicates that the proposed algorithm can automatically balance the trade-off between adhering to safety constraints and maximizing rewards.
arXiv Detail & Related papers (2023-11-01T22:12:50Z) - Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark [12.660770759420286]
We present an environment suite called Safety-Gymnasium, which encompasses safety-critical tasks in both single and multi-agent scenarios.
We offer a library of algorithms named Safe Policy Optimization (SafePO), comprising 16 state-of-the-art SafeRL algorithms.
arXiv Detail & Related papers (2023-10-19T08:19:28Z) - Searching for Optimal Runtime Assurance via Reachability and
Reinforcement Learning [2.422636931175853]
runtime assurance system (RTA) for a given plant enables the exercise of an untrusted or experimental controller while assuring safety with a backup controller.
Existing RTA design strategies are well-known to be overly conservative and, in principle, can lead to safety violations.
In this paper, we formulate the optimal RTA design problem and present a new approach for solving it.
arXiv Detail & Related papers (2023-10-06T14:45:57Z) - Approximate Model-Based Shielding for Safe Reinforcement Learning [83.55437924143615]
We propose a principled look-ahead shielding algorithm for verifying the performance of learned RL policies.
Our algorithm differs from other shielding approaches in that it does not require prior knowledge of the safety-relevant dynamics of the system.
We demonstrate superior performance to other safety-aware approaches on a set of Atari games with state-dependent safety-labels.
arXiv Detail & Related papers (2023-07-27T15:19:45Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Improving Safety in Deep Reinforcement Learning using Unsupervised
Action Planning [4.2955354157580325]
One of the key challenges to deep reinforcement learning (deep RL) is to ensure safety at both training and testing phases.
We propose a novel technique of unsupervised action planning to improve the safety of on-policy reinforcement learning algorithms.
Our results show that the proposed safety RL algorithm can achieve higher rewards compared with multiple baselines in both discrete and continuous control problems.
arXiv Detail & Related papers (2021-09-29T10:26:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.