State-wise Safe Reinforcement Learning: A Survey
- URL: http://arxiv.org/abs/2302.03122v3
- Date: Fri, 30 Jun 2023 19:12:31 GMT
- Title: State-wise Safe Reinforcement Learning: A Survey
- Authors: Weiye Zhao, Tairan He, Rui Chen, Tianhao Wei, Changliu Liu
- Abstract summary: State-wise constraints are one of the most common constraints in real-world applications.
This paper provides a review of existing approaches that address state-wise constraints in RL.
- Score: 5.826308050755618
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the tremendous success of Reinforcement Learning (RL) algorithms in
simulation environments, applying RL to real-world applications still faces
many challenges. A major concern is safety, in another word, constraint
satisfaction. State-wise constraints are one of the most common constraints in
real-world applications and one of the most challenging constraints in Safe RL.
Enforcing state-wise constraints is necessary and essential to many challenging
tasks such as autonomous driving, robot manipulation. This paper provides a
comprehensive review of existing approaches that address state-wise constraints
in RL. Under the framework of State-wise Constrained Markov Decision Process
(SCMDP), we will discuss the connections, differences, and trade-offs of
existing approaches in terms of (i) safety guarantee and scalability, (ii)
safety and reward performance, and (iii) safety after convergence and during
training. We also summarize limitations of current methods and discuss
potential future directions.
Related papers
- Absolute State-wise Constrained Policy Optimization: High-Probability State-wise Constraints Satisfaction [20.00178731842195]
Existing safe reinforcement learning (RL) methods only enforce state-wise constraints in expectation or enforce hard state-wise constraints with strong assumptions.
We propose a novel general-purpose policy search algorithm that guarantees high-probability state-wise satisfaction for constraint systems.
Our results show that ASCPO significantly outperforms existing methods in handling state-wise constraints across challenging continuous control tasks.
arXiv Detail & Related papers (2024-10-02T03:43:33Z) - Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications [73.58451824894568]
The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states.
In safety-critical domains, such behaviors could lead to disastrous outcomes.
We propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
arXiv Detail & Related papers (2024-02-23T23:22:06Z) - A Survey of Constraint Formulations in Safe Reinforcement Learning [15.593999581562203]
Safety is critical when applying reinforcement learning to real-world problems.
A prevalent safe RL approach is based on a constrained criterion, which seeks to maximize the expected cumulative reward.
Despite recent effort to enhance safety in RL, a systematic understanding of the field remains difficult.
arXiv Detail & Related papers (2024-02-03T04:40:31Z) - Gradient Shaping for Multi-Constraint Safe Reinforcement Learning [31.297400160104853]
Online safe reinforcement learning (RL) involves training a policy that maximizes task efficiency while satisfying constraints via interacting with the environments.
We propose a unified framework designed for MC safe RL algorithms.
We introduce the Gradient Shaping (GradS) method for general Lagrangian-based safe RL algorithms to improve the training efficiency in terms of both reward and constraint satisfaction.
arXiv Detail & Related papers (2023-12-23T00:55:09Z) - State-wise Constrained Policy Optimization [10.815583111876892]
State-wise Constrained Policy Optimization is the first general-purpose policy search algorithm for state-wise constrained reinforcement learning.
We show that SCPO significantly outperforms existing methods and can handle state-wise constraints in high-dimensional robotics tasks.
arXiv Detail & Related papers (2023-06-21T22:28:17Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement
Learning in Unknown Stochastic Environments [84.3830478851369]
We propose a safe reinforcement learning approach that can jointly learn the environment and optimize the control policy.
Our approach can effectively enforce hard safety constraints and significantly outperform CMDP-based baseline methods in system safe rate measured via simulations.
arXiv Detail & Related papers (2022-09-29T20:49:25Z) - Safe Reinforcement Learning via Confidence-Based Filters [78.39359694273575]
We develop a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard reinforcement learning techniques.
We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-07-04T11:43:23Z) - Enhancing Safe Exploration Using Safety State Augmentation [71.00929878212382]
We tackle the problem of safe exploration in model-free reinforcement learning.
We derive policies for scheduling the safety budget during training.
We show that Simmer can stabilize training and improve the performance of safe RL with average constraints.
arXiv Detail & Related papers (2022-06-06T15:23:07Z) - Constrained Markov Decision Processes via Backward Value Functions [43.649330976089004]
We model the problem of learning with constraints as a Constrained Markov Decision Process.
A key contribution of our approach is to translate cumulative cost constraints into state-based constraints.
We provide theoretical guarantees under which the agent converges while ensuring safety over the course of training.
arXiv Detail & Related papers (2020-08-26T20:56:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.