Related papers: State-wise Safe Reinforcement Learning: A Survey

State-wise Safe Reinforcement Learning: A Survey

URL: http://arxiv.org/abs/2302.03122v3
Date: Fri, 30 Jun 2023 19:12:31 GMT
Title: State-wise Safe Reinforcement Learning: A Survey
Authors: Weiye Zhao, Tairan He, Rui Chen, Tianhao Wei, Changliu Liu
Abstract summary: State-wise constraints are one of the most common constraints in real-world applications. This paper provides a review of existing approaches that address state-wise constraints in RL.
Score: 5.826308050755618
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the tremendous success of Reinforcement Learning (RL) algorithms in simulation environments, applying RL to real-world applications still faces many challenges. A major concern is safety, in another word, constraint satisfaction. State-wise constraints are one of the most common constraints in real-world applications and one of the most challenging constraints in Safe RL. Enforcing state-wise constraints is necessary and essential to many challenging tasks such as autonomous driving, robot manipulation. This paper provides a comprehensive review of existing approaches that address state-wise constraints in RL. Under the framework of State-wise Constrained Markov Decision Process (SCMDP), we will discuss the connections, differences, and trade-offs of existing approaches in terms of (i) safety guarantee and scalability, (ii) safety and reward performance, and (iii) safety after convergence and during training. We also summarize limitations of current methods and discuss potential future directions.

Related papers

Absolute State-wise Constrained Policy Optimization: High-Probability State-wise Constraints Satisfaction [20.00178731842195]
Existing safe reinforcement learning (RL) methods only enforce state-wise constraints in expectation or enforce hard state-wise constraints with strong assumptions. We propose a novel general-purpose policy search algorithm that guarantees high-probability state-wise satisfaction for constraint systems. Our results show that ASCPO significantly outperforms existing methods in handling state-wise constraints across challenging continuous control tasks.
arXiv Detail & Related papers (2024-10-02T03:43:33Z)
Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications [73.58451824894568]
The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states. In safety-critical domains, such behaviors could lead to disastrous outcomes. We propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
arXiv Detail & Related papers (2024-02-23T23:22:06Z)
A Survey of Constraint Formulations in Safe Reinforcement Learning [15.593999581562203]
Safety is critical when applying reinforcement learning to real-world problems. A prevalent safe RL approach is based on a constrained criterion, which seeks to maximize the expected cumulative reward. Despite recent effort to enhance safety in RL, a systematic understanding of the field remains difficult.
arXiv Detail & Related papers (2024-02-03T04:40:31Z)
Gradient Shaping for Multi-Constraint Safe Reinforcement Learning [31.297400160104853]
Online safe reinforcement learning (RL) involves training a policy that maximizes task efficiency while satisfying constraints via interacting with the environments. We propose a unified framework designed for MC safe RL algorithms. We introduce the Gradient Shaping (GradS) method for general Lagrangian-based safe RL algorithms to improve the training efficiency in terms of both reward and constraint satisfaction.
arXiv Detail & Related papers (2023-12-23T00:55:09Z)
State-wise Constrained Policy Optimization [10.815583111876892]
State-wise Constrained Policy Optimization is the first general-purpose policy search algorithm for state-wise constrained reinforcement learning. We show that SCPO significantly outperforms existing methods and can handle state-wise constraints in high-dimensional robotics tasks.
arXiv Detail & Related papers (2023-06-21T22:28:17Z)
Evaluating Model-free Reinforcement Learning toward Safety-critical Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL. We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection. To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z)
Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments [84.3830478851369]
We propose a safe reinforcement learning approach that can jointly learn the environment and optimize the control policy. Our approach can effectively enforce hard safety constraints and significantly outperform CMDP-based baseline methods in system safe rate measured via simulations.
arXiv Detail & Related papers (2022-09-29T20:49:25Z)
Safe Reinforcement Learning via Confidence-Based Filters [78.39359694273575]
We develop a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard reinforcement learning techniques. We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-07-04T11:43:23Z)
Enhancing Safe Exploration Using Safety State Augmentation [71.00929878212382]
We tackle the problem of safe exploration in model-free reinforcement learning. We derive policies for scheduling the safety budget during training. We show that Simmer can stabilize training and improve the performance of safe RL with average constraints.
arXiv Detail & Related papers (2022-06-06T15:23:07Z)
Constrained Markov Decision Processes via Backward Value Functions [43.649330976089004]
We model the problem of learning with constraints as a Constrained Markov Decision Process. A key contribution of our approach is to translate cumulative cost constraints into state-based constraints. We provide theoretical guarantees under which the agent converges while ensuring safety over the course of training.
arXiv Detail & Related papers (2020-08-26T20:56:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.