Safe Reinforcement Learning for Constrained Markov Decision Processes with Stochastic Stopping Time
- URL: http://arxiv.org/abs/2403.15928v1
- Date: Sat, 23 Mar 2024 20:22:30 GMT
- Title: Safe Reinforcement Learning for Constrained Markov Decision Processes with Stochastic Stopping Time
- Authors: Abhijit Mazumdar, Rafal Wisniewski, Manuela L. Bujorianu,
- Abstract summary: We present an online reinforcement learning algorithm for constrained Markov decision processes with a safety constraint.
We show that the learned policy is safe with high confidence.
We also demonstrate that efficient exploration can be achieved by defining a subset of the state-space called proxy set.
- Score: 0.6554326244334868
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this paper, we present an online reinforcement learning algorithm for constrained Markov decision processes with a safety constraint. Despite the necessary attention of the scientific community, considering stochastic stopping time, the problem of learning optimal policy without violating safety constraints during the learning phase is yet to be addressed. To this end, we propose an algorithm based on linear programming that does not require a process model. We show that the learned policy is safe with high confidence. We also propose a method to compute a safe baseline policy, which is central in developing algorithms that do not violate the safety constraints. Finally, we provide simulation results to show the efficacy of the proposed algorithm. Further, we demonstrate that efficient exploration can be achieved by defining a subset of the state-space called proxy set.
Related papers
- SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization [1.3597551064547502]
This study introduces a novel safe reinforcement learning algorithm, Safety Critic Policy Optimization.
In this study, we define the safety critic, a mechanism that nullifies rewards obtained through violating safety constraints.
Our theoretical analysis indicates that the proposed algorithm can automatically balance the trade-off between adhering to safety constraints and maximizing rewards.
arXiv Detail & Related papers (2023-11-01T22:12:50Z) - A computationally lightweight safe learning algorithm [1.9295598343317182]
We propose a safe learning algorithm that provides probabilistic safety guarantees but leverages the Nadaraya-Watson estimator.
We provide theoretical guarantees for the estimates, embed them into a safe learning algorithm, and show numerical experiments on a simulated seven-degrees-of-freedom robot manipulator.
arXiv Detail & Related papers (2023-09-07T12:21:22Z) - Safe Exploration Method for Reinforcement Learning under Existence of
Disturbance [1.1470070927586016]
We deal with a safe exploration problem in reinforcement learning under the existence of disturbance.
We propose a safe exploration method that uses partial prior knowledge of a controlled object and disturbance.
We illustrate the validity and effectiveness of the proposed method through numerical simulations of an inverted pendulum and a four-bar parallel link robot manipulator.
arXiv Detail & Related papers (2022-09-30T13:00:33Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Safe Online Bid Optimization with Return-On-Investment and Budget
Constraints subject to Uncertainty [87.81197574939355]
We study the nature of both the optimization and learning problems.
We provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations.
More interestingly, we provide an algorithm, namely GCB_safe(psi,phi), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances psi and phi.
arXiv Detail & Related papers (2022-01-18T17:24:20Z) - Safety and Liveness Guarantees through Reach-Avoid Reinforcement
Learning [24.56889192688925]
Reach-avoid optimal control problems are central to safety and liveness assurance for autonomous robotic systems.
Recent successes in reinforcement learning methods to approximately solve optimal control problems with performance objectives make their application to certification problems attractive.
Recent work has shown promise in extending the reinforcement learning machinery to handle safety-type problems, whose objective is not a sum, but a minimum (or maximum) over time.
arXiv Detail & Related papers (2021-12-23T00:44:38Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - Policy Gradient for Continuing Tasks in Non-stationary Markov Decision
Processes [112.38662246621969]
Reinforcement learning considers the problem of finding policies that maximize an expected cumulative reward in a Markov decision process with unknown transition probabilities.
We compute unbiased navigation gradients of the value function which we use as ascent directions to update the policy.
A major drawback of policy gradient-type algorithms is that they are limited to episodic tasks unless stationarity assumptions are imposed.
arXiv Detail & Related papers (2020-10-16T15:15:42Z) - Chance-Constrained Trajectory Optimization for Safe Exploration and
Learning of Nonlinear Systems [81.7983463275447]
Learning-based control algorithms require data collection with abundant supervision for training.
We present a new approach for optimal motion planning with safe exploration that integrates chance-constrained optimal control with dynamics learning and feedback control.
arXiv Detail & Related papers (2020-05-09T05:57:43Z) - Cautious Reinforcement Learning with Logical Constraints [78.96597639789279]
An adaptive safe padding forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process.
Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm.
arXiv Detail & Related papers (2020-02-26T00:01:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.