Absolute State-wise Constrained Policy Optimization: High-Probability State-wise Constraints Satisfaction
- URL: http://arxiv.org/abs/2410.01212v1
- Date: Wed, 2 Oct 2024 03:43:33 GMT
- Title: Absolute State-wise Constrained Policy Optimization: High-Probability State-wise Constraints Satisfaction
- Authors: Weiye Zhao, Feihan Li, Yifan Sun, Yujie Wang, Rui Chen, Tianhao Wei, Changliu Liu,
- Abstract summary: Existing safe reinforcement learning (RL) methods only enforce state-wise constraints in expectation or enforce hard state-wise constraints with strong assumptions.
We propose a novel general-purpose policy search algorithm that guarantees high-probability state-wise satisfaction for constraint systems.
Our results show that ASCPO significantly outperforms existing methods in handling state-wise constraints across challenging continuous control tasks.
- Score: 20.00178731842195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Enforcing state-wise safety constraints is critical for the application of reinforcement learning (RL) in real-world problems, such as autonomous driving and robot manipulation. However, existing safe RL methods only enforce state-wise constraints in expectation or enforce hard state-wise constraints with strong assumptions. The former does not exclude the probability of safety violations, while the latter is impractical. Our insight is that although it is intractable to guarantee hard state-wise constraints in a model-free setting, we can enforce state-wise safety with high probability while excluding strong assumptions. To accomplish the goal, we propose Absolute State-wise Constrained Policy Optimization (ASCPO), a novel general-purpose policy search algorithm that guarantees high-probability state-wise constraint satisfaction for stochastic systems. We demonstrate the effectiveness of our approach by training neural network policies for extensive robot locomotion tasks, where the agent must adhere to various state-wise safety constraints. Our results show that ASCPO significantly outperforms existing methods in handling state-wise constraints across challenging continuous control tasks, highlighting its potential for real-world applications.
Related papers
- Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications [73.58451824894568]
The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states.
In safety-critical domains, such behaviors could lead to disastrous outcomes.
We propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
arXiv Detail & Related papers (2024-02-23T23:22:06Z) - Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states.
The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z) - State-wise Constrained Policy Optimization [10.815583111876892]
State-wise Constrained Policy Optimization is the first general-purpose policy search algorithm for state-wise constrained reinforcement learning.
We show that SCPO significantly outperforms existing methods and can handle state-wise constraints in high-dimensional robotics tasks.
arXiv Detail & Related papers (2023-06-21T22:28:17Z) - State-wise Safe Reinforcement Learning: A Survey [5.826308050755618]
State-wise constraints are one of the most common constraints in real-world applications.
This paper provides a review of existing approaches that address state-wise constraints in RL.
arXiv Detail & Related papers (2023-02-06T21:11:29Z) - Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement
Learning in Unknown Stochastic Environments [84.3830478851369]
We propose a safe reinforcement learning approach that can jointly learn the environment and optimize the control policy.
Our approach can effectively enforce hard safety constraints and significantly outperform CMDP-based baseline methods in system safe rate measured via simulations.
arXiv Detail & Related papers (2022-09-29T20:49:25Z) - Safe Reinforcement Learning via Confidence-Based Filters [78.39359694273575]
We develop a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard reinforcement learning techniques.
We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-07-04T11:43:23Z) - Enhancing Safe Exploration Using Safety State Augmentation [71.00929878212382]
We tackle the problem of safe exploration in model-free reinforcement learning.
We derive policies for scheduling the safety budget during training.
We show that Simmer can stabilize training and improve the performance of safe RL with average constraints.
arXiv Detail & Related papers (2022-06-06T15:23:07Z) - Constrained Policy Optimization via Bayesian World Models [79.0077602277004]
LAMBDA is a model-based approach for policy optimization in safety critical tasks modeled via constrained Markov decision processes.
We demonstrate LAMBDA's state of the art performance on the Safety-Gym benchmark suite in terms of sample efficiency and constraint violation.
arXiv Detail & Related papers (2022-01-24T17:02:22Z) - Feasible Actor-Critic: Constrained Reinforcement Learning for Ensuring
Statewise Safety [1.9573380763700712]
We introduce the feasible actor-critic (FAC) algorithm, which is the first model-free constrained safe reinforcement learning method.
We claim that some states are inherently unsafe no matter what policy we choose, while for other states there exist policies ensuring safety, where we say such states and policies are feasible.
We provide theoretical guarantees that FAC outperforms previous expectation-based constrained RL methods in terms of both constraint satisfaction and reward optimization.
arXiv Detail & Related papers (2021-05-22T10:40:58Z) - Constrained Markov Decision Processes via Backward Value Functions [43.649330976089004]
We model the problem of learning with constraints as a Constrained Markov Decision Process.
A key contribution of our approach is to translate cumulative cost constraints into state-based constraints.
We provide theoretical guarantees under which the agent converges while ensuring safety over the course of training.
arXiv Detail & Related papers (2020-08-26T20:56:16Z) - Safe reinforcement learning for probabilistic reachability and safety
specifications: A Lyapunov-based approach [2.741266294612776]
We propose a model-free safety specification method that learns the maximal probability of safe operation.
Our approach constructs a Lyapunov function with respect to a safe policy to restrain each policy improvement stage.
It yields a sequence of safe policies that determine the range of safe operation, called the safe set.
arXiv Detail & Related papers (2020-02-24T09:20:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.