Probabilistic Constraint for Safety-Critical Reinforcement Learning
- URL: http://arxiv.org/abs/2306.17279v2
- Date: Wed, 13 Mar 2024 03:58:56 GMT
- Title: Probabilistic Constraint for Safety-Critical Reinforcement Learning
- Authors: Weiqin Chen, Dharmashankar Subramanian and Santiago Paternain
- Abstract summary: We consider the problem of learning safe policies for probabilistic-constrained reinforcement learning (RL)
We provide an improved gradient SPG-Actor-Critic that leads to a lower variance than SPG-REINFORCE.
We propose a Safe Primal-Dual algorithm that can leverage both SPGs to learn safe policies.
- Score: 13.502008069967552
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we consider the problem of learning safe policies for
probabilistic-constrained reinforcement learning (RL). Specifically, a safe
policy or controller is one that, with high probability, maintains the
trajectory of the agent in a given safe set. We establish a connection between
this probabilistic-constrained setting and the cumulative-constrained
formulation that is frequently explored in the existing literature. We provide
theoretical bounds elucidating that the probabilistic-constrained setting
offers a better trade-off in terms of optimality and safety (constraint
satisfaction). The challenge encountered when dealing with the probabilistic
constraints, as explored in this work, arises from the absence of explicit
expressions for their gradients. Our prior work provides such an explicit
gradient expression for probabilistic constraints which we term Safe Policy
Gradient-REINFORCE (SPG-REINFORCE). In this work, we provide an improved
gradient SPG-Actor-Critic that leads to a lower variance than SPG-REINFORCE,
which is substantiated by our theoretical results. A noteworthy aspect of both
SPGs is their inherent algorithm independence, rendering them versatile for
application across a range of policy-based algorithms. Furthermore, we propose
a Safe Primal-Dual algorithm that can leverage both SPGs to learn safe
policies. It is subsequently followed by theoretical analyses that encompass
the convergence of the algorithm, as well as the near-optimality and
feasibility on average. In addition, we test the proposed approaches by a
series of empirical experiments. These experiments aim to examine and analyze
the inherent trade-offs between the optimality and safety, and serve to
substantiate the efficacy of two SPGs, as well as our theoretical
contributions.
Related papers
- Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Policy Bifurcation in Safe Reinforcement Learning [35.75059015441807]
In some scenarios, the feasible policy should be discontinuous or multi-valued, interpolating between discontinuous local optima can inevitably lead to constraint violations.
We are the first to identify the generating mechanism of such a phenomenon, and employ topological analysis to rigorously prove the existence of bifurcation in safe RL.
We propose a safe RL algorithm called multimodal policy optimization (MUPO), which utilizes a Gaussian mixture distribution as the policy output.
arXiv Detail & Related papers (2024-03-19T15:54:38Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Risk-sensitive Markov Decision Process and Learning under General
Utility Functions [3.6260136172126667]
Reinforcement Learning (RL) has gained substantial attention across diverse application domains and theoretical investigations.
We propose a modified value algorithm that employs an epsilon-covering over the space of cumulative reward.
In the absence of a simulator, our algorithm, designed with an upper-confidence-bound exploration approach, identifies a near-optimal policy.
arXiv Detail & Related papers (2023-11-22T18:50:06Z) - SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization [1.3597551064547502]
This study introduces a novel safe reinforcement learning algorithm, Safety Critic Policy Optimization.
In this study, we define the safety critic, a mechanism that nullifies rewards obtained through violating safety constraints.
Our theoretical analysis indicates that the proposed algorithm can automatically balance the trade-off between adhering to safety constraints and maximizing rewards.
arXiv Detail & Related papers (2023-11-01T22:12:50Z) - Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states.
The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z) - Provable Offline Preference-Based Reinforcement Learning [95.00042541409901]
We investigate the problem of offline Preference-based Reinforcement Learning (PbRL) with human feedback.
We consider the general reward setting where the reward can be defined over the whole trajectory.
We introduce a new single-policy concentrability coefficient, which can be upper bounded by the per-trajectory concentrability.
arXiv Detail & Related papers (2023-05-24T07:11:26Z) - Policy Gradients for Probabilistic Constrained Reinforcement Learning [13.441235221641717]
This paper considers the problem of learning safe policies in the context of reinforcement learning (RL)
We aim to design policies that maintain the state of the system in a safe set with high probability.
arXiv Detail & Related papers (2022-10-02T18:16:33Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z) - Chance Constrained Policy Optimization for Process Control and
Optimization [1.4908563154226955]
Chemical process optimization and control are affected by 1) plant-model mismatch, 2) process disturbances, and 3) constraints for safe operation.
We propose a chance constrained policy optimization algorithm which guarantees the satisfaction of joint chance constraints with a high probability.
arXiv Detail & Related papers (2020-07-30T14:20:35Z) - Cautious Reinforcement Learning with Logical Constraints [78.96597639789279]
An adaptive safe padding forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process.
Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm.
arXiv Detail & Related papers (2020-02-26T00:01:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.