Related papers: Policy Gradients for Probabilistic Constrained Reinforcement Learning

Policy Gradients for Probabilistic Constrained Reinforcement Learning

URL: http://arxiv.org/abs/2210.00596v2
Date: Tue, 18 Apr 2023 20:54:12 GMT
Title: Policy Gradients for Probabilistic Constrained Reinforcement Learning
Authors: Weiqin Chen, Dharmashankar Subramanian and Santiago Paternain
Abstract summary: This paper considers the problem of learning safe policies in the context of reinforcement learning (RL) We aim to design policies that maintain the state of the system in a safe set with high probability.
Score: 13.441235221641717
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper considers the problem of learning safe policies in the context of reinforcement learning (RL). In particular, we consider the notion of probabilistic safety. This is, we aim to design policies that maintain the state of the system in a safe set with high probability. This notion differs from cumulative constraints often considered in the literature. The challenge of working with probabilistic safety is the lack of expressions for their gradients. Indeed, policy optimization algorithms rely on gradients of the objective function and the constraints. To the best of our knowledge, this work is the first one providing such explicit gradient expressions for probabilistic constraints. It is worth noting that the gradient of this family of constraints can be applied to various policy-based algorithms. We demonstrate empirically that it is possible to handle probabilistic constraints in a continuous navigation problem.

Related papers

SPoRt -- Safe Policy Ratio: Certified Training and Deployment of Task Policies in Model-Free RL [54.022106606140774]
We present theoretical results that provide a bound on the probability of violating a safety property for a new task-specific policy in a model-free, episodic setup. We also present SPoRt, which enables the user to trade off safety guarantees in exchange for task-specific performance.
arXiv Detail & Related papers (2025-04-08T19:09:07Z)
Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation [25.552241659930445]
Key aspect of Safe Reinforcement Learning (Safe RL) involves estimating the constraint condition for the next policy. Existing estimation methods rely on the infinite-horizon discounted advantage function. We propose the first estimation method for finite-horizon non-discounted constraints in deep Safe RL.
arXiv Detail & Related papers (2024-12-15T10:05:23Z)
Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems. In common practice, convergence (hyper)policies are learned only to deploy their deterministic version. We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z)
Policy Bifurcation in Safe Reinforcement Learning [35.75059015441807]
In some scenarios, the feasible policy should be discontinuous or multi-valued, interpolating between discontinuous local optima can inevitably lead to constraint violations. We are the first to identify the generating mechanism of such a phenomenon, and employ topological analysis to rigorously prove the existence of bifurcation in safe RL. We propose a safe RL algorithm called multimodal policy optimization (MUPO), which utilizes a Gaussian mixture distribution as the policy output.
arXiv Detail & Related papers (2024-03-19T15:54:38Z)
Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states. The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z)
Probabilistic Constraint for Safety-Critical Reinforcement Learning [13.502008069967552]
We consider the problem of learning safe policies for probabilistic-constrained reinforcement learning (RL) We provide an improved gradient SPG-Actor-Critic that leads to a lower variance than SPG-REINFORCE. We propose a Safe Primal-Dual algorithm that can leverage both SPGs to learn safe policies.
arXiv Detail & Related papers (2023-06-29T19:41:56Z)
Policy Gradient for Rectangular Robust Markov Decision Processes [62.397882389472564]
We introduce robust policy gradient (RPG), a policy-based method that efficiently solves rectangular robust Markov decision processes (MDPs) Our resulting RPG can be estimated from data with the same time complexity as its non-robust equivalent.
arXiv Detail & Related papers (2023-01-31T12:40:50Z)
Constrained Policy Gradient Method for Safe and Fast Reinforcement Learning: a Neural Tangent Kernel Based Approach [6.316693022958221]
This paper presents a constrained policy gradient algorithm for safe learning. We introduce constraints for safe learning with the following steps. The efficiency of the constrained learning was demonstrated with a shallow and wide ReLU network in the Cartpole and Lunar Lander OpenAI gym environments.
arXiv Detail & Related papers (2021-07-19T20:25:15Z)
Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria. We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z)
Policy Gradient for Continuing Tasks in Non-stationary Markov Decision Processes [112.38662246621969]
Reinforcement learning considers the problem of finding policies that maximize an expected cumulative reward in a Markov decision process with unknown transition probabilities. We compute unbiased navigation gradients of the value function which we use as ascent directions to update the policy. A major drawback of policy gradient-type algorithms is that they are limited to episodic tasks unless stationarity assumptions are imposed.
arXiv Detail & Related papers (2020-10-16T15:15:42Z)
Chance Constrained Policy Optimization for Process Control and Optimization [1.4908563154226955]
Chemical process optimization and control are affected by 1) plant-model mismatch, 2) process disturbances, and 3) constraints for safe operation. We propose a chance constrained policy optimization algorithm which guarantees the satisfaction of joint chance constraints with a high probability.
arXiv Detail & Related papers (2020-07-30T14:20:35Z)
Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies [80.42316902296832]
We study the estimation of policy value and gradient of a deterministic policy from off-policy data when actions are continuous. In this setting, standard importance sampling and doubly robust estimators for policy value and gradient fail because the density ratio does not exist. We propose several new doubly robust estimators based on different kernelization approaches.
arXiv Detail & Related papers (2020-06-06T15:52:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.