Joint Synthesis of Safety Certificate and Safe Control Policy using
Constrained Reinforcement Learning
- URL: http://arxiv.org/abs/2111.07695v1
- Date: Mon, 15 Nov 2021 12:05:44 GMT
- Title: Joint Synthesis of Safety Certificate and Safe Control Policy using
Constrained Reinforcement Learning
- Authors: Haitong Ma, Changliu Liu, Shengbo Eben Li, Sifa Zheng, Jianyu Chen
- Abstract summary: A valid safety certificate is an energy function indicating that safe states are with low energy.
Existing learning-based studies treat the safety certificate and the safe control policy as prior knowledge to learn the other.
This paper proposes a novel approach that simultaneously synthesizes the energy-function-based safety certificate and learns the safe control policy with CRL.
- Score: 7.658716383823426
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Safety is the major consideration in controlling complex dynamical systems
using reinforcement learning (RL), where the safety certificate can provide
provable safety guarantee. A valid safety certificate is an energy function
indicating that safe states are with low energy, and there exists a
corresponding safe control policy that allows the energy function to always
dissipate. The safety certificate and the safe control policy are closely
related to each other and both challenging to synthesize. Therefore, existing
learning-based studies treat either of them as prior knowledge to learn the
other, which limits their applicability with general unknown dynamics. This
paper proposes a novel approach that simultaneously synthesizes the
energy-function-based safety certificate and learns the safe control policy
with CRL. We do not rely on prior knowledge about either an available
model-based controller or a perfect safety certificate. In particular, we
formulate a loss function to optimize the safety certificate parameters by
minimizing the occurrence of energy increases. By adding this optimization
procedure as an outer loop to the Lagrangian-based constrained reinforcement
learning (CRL), we jointly update the policy and safety certificate parameters
and prove that they will converge to their respective local optima, the optimal
safe policy and a valid safety certificate. We evaluate our algorithms on
multiple safety-critical benchmark environments. The results show that the
proposed algorithm learns provably safe policies with no constraint violation.
The validity or feasibility of synthesized safety certificate is also verified
numerically.
Related papers
- Safe Reinforcement Learning with Learned Non-Markovian Safety Constraints [15.904640266226023]
We design a safety model that performs credit assignment to assess contributions of partial state-action trajectories on safety.
We derive an effective algorithm for optimizing a safe policy using the learned safety model.
We devise a method to dynamically adapt the tradeoff coefficient between safety reward and safety compliance.
arXiv Detail & Related papers (2024-05-05T17:27:22Z) - Implicit Safe Set Algorithm for Provably Safe Reinforcement Learning [7.349727826230864]
We present a model-free safe control algorithm, the implicit safe set algorithm, for synthesizing safeguards for DRL agents.
The proposed algorithm synthesizes a safety index (barrier certificate) and a subsequent safe control law solely by querying a black-box dynamic function.
We validate the proposed algorithm on the state-of-the-art Safety Gym benchmark, where it achieves zero safety violations while gaining $95% pm 9%$ cumulative reward.
arXiv Detail & Related papers (2024-05-04T20:59:06Z) - Sampling-based Safe Reinforcement Learning for Nonlinear Dynamical
Systems [15.863561935347692]
We develop provably safe and convergent reinforcement learning algorithms for control of nonlinear dynamical systems.
Recent advances at the intersection of control and RL follow a two-stage, safety filter approach to enforcing hard safety constraints.
We develop a single-stage, sampling-based approach to hard constraint satisfaction that learns RL controllers enjoying classical convergence guarantees.
arXiv Detail & Related papers (2024-03-06T19:39:20Z) - Safe Online Dynamics Learning with Initially Unknown Models and
Infeasible Safety Certificates [45.72598064481916]
This paper considers a learning-based setting with a robust safety certificate based on a control barrier function (CBF) second-order cone program.
If the control barrier function certificate is feasible, our approach leverages it to guarantee safety. Otherwise, our method explores the system dynamics to collect data and recover the feasibility of the control barrier function constraint.
arXiv Detail & Related papers (2023-11-03T14:23:57Z) - Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers.
We then present the pointwise feasibility conditions of the resulting safety controller.
We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z) - Safe Reinforcement Learning via Confidence-Based Filters [78.39359694273575]
We develop a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard reinforcement learning techniques.
We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-07-04T11:43:23Z) - SAFER: Data-Efficient and Safe Reinforcement Learning via Skill
Acquisition [59.94644674087599]
We propose SAFEty skill pRiors (SAFER), an algorithm that accelerates policy learning on complex control tasks under safety constraints.
Through principled training on an offline dataset, SAFER learns to extract safe primitive skills.
In the inference stage, policies trained with SAFER learn to compose safe skills into successful policies.
arXiv Detail & Related papers (2022-02-10T05:43:41Z) - Learning Barrier Certificates: Towards Safe Reinforcement Learning with
Zero Training-time Violations [64.39401322671803]
This paper explores the possibility of safe RL algorithms with zero training-time safety violations.
We propose an algorithm, Co-trained Barrier Certificate for Safe RL (CRABS), which iteratively learns barrier certificates, dynamics models, and policies.
arXiv Detail & Related papers (2021-08-04T04:59:05Z) - Cautious Reinforcement Learning with Logical Constraints [78.96597639789279]
An adaptive safe padding forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process.
Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm.
arXiv Detail & Related papers (2020-02-26T00:01:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.