Joint Differentiable Optimization and Verification for Certified
Reinforcement Learning
- URL: http://arxiv.org/abs/2201.12243v2
- Date: Tue, 21 Mar 2023 04:41:42 GMT
- Title: Joint Differentiable Optimization and Verification for Certified
Reinforcement Learning
- Authors: Yixuan Wang, Simon Zhan, Zhilu Wang, Chao Huang, Zhaoran Wang, Zhuoran
Yang, Qi Zhu
- Abstract summary: In model-based reinforcement learning for safety-critical control systems, it is important to formally certify system properties.
We propose a framework that jointly conducts reinforcement learning and formal verification.
- Score: 91.93635157885055
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In model-based reinforcement learning for safety-critical control systems, it
is important to formally certify system properties (e.g., safety, stability)
under the learned controller. However, as existing methods typically apply
formal verification \emph{after} the controller has been learned, it is
sometimes difficult to obtain any certificate, even after many iterations
between learning and verification. To address this challenge, we propose a
framework that jointly conducts reinforcement learning and formal verification
by formulating and solving a novel bilevel optimization problem, which is
differentiable by the gradients from the value function and certificates.
Experiments on a variety of examples demonstrate the significant advantages of
our framework over the model-based stochastic value gradient (SVG) method and
the model-free proximal policy optimization (PPO) method in finding feasible
controllers with barrier functions and Lyapunov functions that ensure system
safety and stability.
Related papers
- Data-Driven Distributionally Robust Safety Verification Using Barrier Certificates and Conditional Mean Embeddings [0.24578723416255752]
We develop scalable formal verification algorithms without shifting the problem to unrealistic assumptions.
In a pursuit of developing scalable formal verification algorithms without shifting the problem to unrealistic assumptions, we employ the concept of barrier certificates.
We show how to solve the resulting program efficiently using sum-of-squares optimization and a Gaussian process envelope.
arXiv Detail & Related papers (2024-03-15T17:32:02Z) - Enhancing Security in Federated Learning through Adaptive
Consensus-Based Model Update Validation [2.28438857884398]
This paper introduces an advanced approach for fortifying Federated Learning (FL) systems against label-flipping attacks.
We propose a consensus-based verification process integrated with an adaptive thresholding mechanism.
Our results indicate a significant mitigation of label-flipping attacks, bolstering the FL system's resilience.
arXiv Detail & Related papers (2024-03-05T20:54:56Z) - Safe Online Dynamics Learning with Initially Unknown Models and
Infeasible Safety Certificates [45.72598064481916]
This paper considers a learning-based setting with a robust safety certificate based on a control barrier function (CBF) second-order cone program.
If the control barrier function certificate is feasible, our approach leverages it to guarantee safety. Otherwise, our method explores the system dynamics to collect data and recover the feasibility of the control barrier function constraint.
arXiv Detail & Related papers (2023-11-03T14:23:57Z) - A General Framework for Verification and Control of Dynamical Models via Certificate Synthesis [54.959571890098786]
We provide a framework to encode system specifications and define corresponding certificates.
We present an automated approach to formally synthesise controllers and certificates.
Our approach contributes to the broad field of safe learning for control, exploiting the flexibility of neural networks.
arXiv Detail & Related papers (2023-09-12T09:37:26Z) - Value Functions are Control Barrier Functions: Verification of Safe
Policies using Control Theory [46.85103495283037]
We propose a new approach to apply verification methods from control theory to learned value functions.
We formalize original theorems that establish links between value functions and control barrier functions.
Our work marks a significant step towards a formal framework for the general, scalable, and verifiable design of RL-based control systems.
arXiv Detail & Related papers (2023-06-06T21:41:31Z) - Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers.
We then present the pointwise feasibility conditions of the resulting safety controller.
We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z) - Probabilistic robust linear quadratic regulators with Gaussian processes [73.0364959221845]
Probabilistic models such as Gaussian processes (GPs) are powerful tools to learn unknown dynamical systems from data for subsequent use in control design.
We present a novel controller synthesis for linearized GP dynamics that yields robust controllers with respect to a probabilistic stability margin.
arXiv Detail & Related papers (2021-05-17T08:36:18Z) - Closing the Closed-Loop Distribution Shift in Safe Imitation Learning [80.05727171757454]
We treat safe optimization-based control strategies as experts in an imitation learning problem.
We train a learned policy that can be cheaply evaluated at run-time and that provably satisfies the same safety guarantees as the expert.
arXiv Detail & Related papers (2021-02-18T05:11:41Z) - Learning Stability Certificates from Data [19.381365606166725]
We develop algorithms for learning certificate functions only from trajectory data.
We convert such generalization error bounds into global stability guarantees.
We demonstrate empirically that certificates for complex dynamics can be efficiently learned.
arXiv Detail & Related papers (2020-08-13T14:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.