Joint Learning of Policy with Unknown Temporal Constraints for Safe
Reinforcement Learning
- URL: http://arxiv.org/abs/2305.00576v1
- Date: Sun, 30 Apr 2023 21:15:07 GMT
- Title: Joint Learning of Policy with Unknown Temporal Constraints for Safe
Reinforcement Learning
- Authors: Lunet Yifru and Ali Baheri
- Abstract summary: We propose a framework that concurrently learns safety constraints and optimal RL policies.
The framework is underpinned by theorems that establish the convergence of our joint learning process.
We showcased our framework in grid-world environments, successfully identifying both acceptable safety constraints and RL policies.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In many real-world applications, safety constraints for reinforcement
learning (RL) algorithms are either unknown or not explicitly defined. We
propose a framework that concurrently learns safety constraints and optimal RL
policies in such environments, supported by theoretical guarantees. Our
approach merges a logically-constrained RL algorithm with an evolutionary
algorithm to synthesize signal temporal logic (STL) specifications. The
framework is underpinned by theorems that establish the convergence of our
joint learning process and provide error bounds between the discovered policy
and the true optimal policy. We showcased our framework in grid-world
environments, successfully identifying both acceptable safety constraints and
RL policies while demonstrating the effectiveness of our theorems in practice.
Related papers
- Embedding Safety into RL: A New Take on Trust Region Methods [1.5733417396701983]
Reinforcement Learning (RL) agents are able to solve a wide variety of tasks but are prone to unsafe behaviors.
We propose Constrained Trust Region Policy Optimization (C-TRPO), a novel approach that modifies the geometry of the policy space based on the safety constraints.
arXiv Detail & Related papers (2024-11-05T09:55:50Z) - Policy Bifurcation in Safe Reinforcement Learning [35.75059015441807]
In some scenarios, the feasible policy should be discontinuous or multi-valued, interpolating between discontinuous local optima can inevitably lead to constraint violations.
We are the first to identify the generating mechanism of such a phenomenon, and employ topological analysis to rigorously prove the existence of bifurcation in safe RL.
We propose a safe RL algorithm called multimodal policy optimization (MUPO), which utilizes a Gaussian mixture distribution as the policy output.
arXiv Detail & Related papers (2024-03-19T15:54:38Z) - Concurrent Learning of Policy and Unknown Safety Constraints in Reinforcement Learning [4.14360329494344]
Reinforcement learning (RL) has revolutionized decision-making across a wide range of domains over the past few decades.
Yet, deploying RL policies in real-world scenarios presents the crucial challenge of ensuring safety.
Traditional safe RL approaches have predominantly focused on incorporating predefined safety constraints into the policy learning process.
We propose a novel approach that concurrently learns a safe RL control policy and identifies the unknown safety constraint parameters of a given environment.
arXiv Detail & Related papers (2024-02-24T20:01:15Z) - A Survey of Constraint Formulations in Safe Reinforcement Learning [15.593999581562203]
Safety is critical when applying reinforcement learning to real-world problems.
A prevalent safe RL approach is based on a constrained criterion, which seeks to maximize the expected cumulative reward.
Despite recent effort to enhance safety in RL, a systematic understanding of the field remains difficult.
arXiv Detail & Related papers (2024-02-03T04:40:31Z) - Provably Efficient Iterated CVaR Reinforcement Learning with Function
Approximation and Human Feedback [57.6775169085215]
Risk-sensitive reinforcement learning aims to optimize policies that balance the expected reward and risk.
We present a novel framework that employs an Iterated Conditional Value-at-Risk (CVaR) objective under both linear and general function approximations.
We propose provably sample-efficient algorithms for this Iterated CVaR RL and provide rigorous theoretical analysis.
arXiv Detail & Related papers (2023-07-06T08:14:54Z) - Reinforcement Learning with Stepwise Fairness Constraints [50.538878453547966]
We introduce the study of reinforcement learning with stepwise fairness constraints.
We provide learning algorithms with strong theoretical guarantees in regard to policy optimality and fairness violation.
arXiv Detail & Related papers (2022-11-08T04:06:23Z) - A Regularized Implicit Policy for Offline Reinforcement Learning [54.7427227775581]
offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment.
We propose a framework that supports learning a flexible yet well-regularized fully-implicit policy.
Experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
arXiv Detail & Related papers (2022-02-19T20:22:04Z) - Policy Mirror Descent for Regularized Reinforcement Learning: A
Generalized Framework with Linear Convergence [60.20076757208645]
This paper proposes a general policy mirror descent (GPMD) algorithm for solving regularized RL.
We demonstrate that our algorithm converges linearly over an entire range learning rates, in a dimension-free fashion, to the global solution.
arXiv Detail & Related papers (2021-05-24T02:21:34Z) - Deep Reinforcement Learning with Robust and Smooth Policy [90.78795857181727]
We propose to learn a smooth policy that behaves smoothly with respect to states.
We develop a new framework -- textbfSmooth textbfRegularized textbfReinforcement textbfLearning ($textbfSR2textbfL$), where the policy is trained with smoothness-inducing regularization.
Such regularization effectively constrains the search space, and enforces smoothness in the learned policy.
arXiv Detail & Related papers (2020-03-21T00:10:29Z) - Cautious Reinforcement Learning with Logical Constraints [78.96597639789279]
An adaptive safe padding forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process.
Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm.
arXiv Detail & Related papers (2020-02-26T00:01:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.