Feasible Actor-Critic: Constrained Reinforcement Learning for Ensuring
Statewise Safety
- URL: http://arxiv.org/abs/2105.10682v1
- Date: Sat, 22 May 2021 10:40:58 GMT
- Title: Feasible Actor-Critic: Constrained Reinforcement Learning for Ensuring
Statewise Safety
- Authors: Haitong Ma, Yang Guan, Shegnbo Eben Li, Xiangteng Zhang, Sifa Zheng,
Jianyu Chen
- Abstract summary: We introduce the feasible actor-critic (FAC) algorithm, which is the first model-free constrained safe reinforcement learning method.
We claim that some states are inherently unsafe no matter what policy we choose, while for other states there exist policies ensuring safety, where we say such states and policies are feasible.
We provide theoretical guarantees that FAC outperforms previous expectation-based constrained RL methods in terms of both constraint satisfaction and reward optimization.
- Score: 1.9573380763700712
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The safety constraints commonly used by existing safe reinforcement learning
(RL) methods are defined only on expectation of initial states, but allow each
certain state to be unsafe, which is unsatisfying for real-world
safety-critical tasks. In this paper, we introduce the feasible actor-critic
(FAC) algorithm, which is the first model-free constrained RL method that
considers statewise safety, e.g, safety for each initial state. We claim that
some states are inherently unsafe no matter what policy we choose, while for
other states there exist policies ensuring safety, where we say such states and
policies are feasible. By constructing a statewise Lagrange function available
on RL sampling and adopting an additional neural network to approximate the
statewise Lagrange multiplier, we manage to obtain the optimal feasible policy
which ensures safety for each feasible state and the safest possible policy for
infeasible states. Furthermore, the trained multiplier net can indicate whether
a given state is feasible or not through the statewise complementary slackness
condition. We provide theoretical guarantees that FAC outperforms previous
expectation-based constrained RL methods in terms of both constraint
satisfaction and reward optimization. Experimental results on both robot
locomotive tasks and safe exploration tasks verify the safety enhancement and
feasibility interpretation of the proposed method.
Related papers
- Absolute State-wise Constrained Policy Optimization: High-Probability State-wise Constraints Satisfaction [20.00178731842195]
Existing safe reinforcement learning (RL) methods only enforce state-wise constraints in expectation or enforce hard state-wise constraints with strong assumptions.
We propose a novel general-purpose policy search algorithm that guarantees high-probability state-wise satisfaction for constraint systems.
Our results show that ASCPO significantly outperforms existing methods in handling state-wise constraints across challenging continuous control tasks.
arXiv Detail & Related papers (2024-10-02T03:43:33Z) - Safe Reinforcement Learning with Learned Non-Markovian Safety Constraints [15.904640266226023]
We design a safety model that performs credit assignment to assess contributions of partial state-action trajectories on safety.
We derive an effective algorithm for optimizing a safe policy using the learned safety model.
We devise a method to dynamically adapt the tradeoff coefficient between safety reward and safety compliance.
arXiv Detail & Related papers (2024-05-05T17:27:22Z) - Safe Reinforcement Learning with Dead-Ends Avoidance and Recovery [13.333197887318168]
Safety is one of the main challenges in applying reinforcement learning to realistic environmental tasks.
We propose a method to construct a boundary that discriminates safe and unsafe states.
Our approach has better task performance with less safety violations than state-of-the-art algorithms.
arXiv Detail & Related papers (2023-06-24T12:02:50Z) - Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement
Learning in Unknown Stochastic Environments [84.3830478851369]
We propose a safe reinforcement learning approach that can jointly learn the environment and optimize the control policy.
Our approach can effectively enforce hard safety constraints and significantly outperform CMDP-based baseline methods in system safe rate measured via simulations.
arXiv Detail & Related papers (2022-09-29T20:49:25Z) - Safe Reinforcement Learning via Confidence-Based Filters [78.39359694273575]
We develop a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard reinforcement learning techniques.
We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-07-04T11:43:23Z) - Enhancing Safe Exploration Using Safety State Augmentation [71.00929878212382]
We tackle the problem of safe exploration in model-free reinforcement learning.
We derive policies for scheduling the safety budget during training.
We show that Simmer can stabilize training and improve the performance of safe RL with average constraints.
arXiv Detail & Related papers (2022-06-06T15:23:07Z) - SAUTE RL: Almost Surely Safe Reinforcement Learning Using State
Augmentation [63.25418599322092]
Satisfying safety constraints almost surely (or with probability one) can be critical for deployment of Reinforcement Learning (RL) in real-life applications.
We address the problem by introducing Safety Augmented Markov Decision Processes (MDPs)
We show that Saute MDP allows to view Safe augmentation problem from a different perspective enabling new features.
arXiv Detail & Related papers (2022-02-14T08:57:01Z) - Conservative Safety Critics for Exploration [120.73241848565449]
We study the problem of safe exploration in reinforcement learning (RL)
We learn a conservative safety estimate of environment states through a critic.
We show that the proposed approach can achieve competitive task performance while incurring significantly lower catastrophic failure rates.
arXiv Detail & Related papers (2020-10-27T17:54:25Z) - Cautious Reinforcement Learning with Logical Constraints [78.96597639789279]
An adaptive safe padding forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process.
Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm.
arXiv Detail & Related papers (2020-02-26T00:01:08Z) - Safe reinforcement learning for probabilistic reachability and safety
specifications: A Lyapunov-based approach [2.741266294612776]
We propose a model-free safety specification method that learns the maximal probability of safe operation.
Our approach constructs a Lyapunov function with respect to a safe policy to restrain each policy improvement stage.
It yields a sequence of safe policies that determine the range of safe operation, called the safe set.
arXiv Detail & Related papers (2020-02-24T09:20:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.