Learning Control Policies for Stochastic Systems with Reach-avoid
Guarantees
- URL: http://arxiv.org/abs/2210.05308v1
- Date: Tue, 11 Oct 2022 10:02:49 GMT
- Title: Learning Control Policies for Stochastic Systems with Reach-avoid
Guarantees
- Authors: {\DJ}or{\dj}e \v{Z}ikeli\'c, Mathias Lechner, Thomas A. Henzinger,
Krishnendu Chatterjee
- Abstract summary: We study the problem of learning controllers for discrete-time non-linear dynamical systems with formal reach-avoid guarantees.
We learn a certificate in the form of a reach-avoid supermartingale (RASM), a novel notion that we introduce in this work.
Our approach solves several important problems -- it can be used to learn a control policy from scratch, to verify a reach-avoid specification for a fixed control policy, or to fine-tune a pre-trained policy.
- Score: 20.045860624444494
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We study the problem of learning controllers for discrete-time non-linear
stochastic dynamical systems with formal reach-avoid guarantees. This work
presents the first method for providing formal reach-avoid guarantees, which
combine and generalize stability and safety guarantees, with a tolerable
probability threshold $p\in[0,1]$ over the infinite time horizon. Our method
leverages advances in machine learning literature and it represents formal
certificates as neural networks. In particular, we learn a certificate in the
form of a reach-avoid supermartingale (RASM), a novel notion that we introduce
in this work. Our RASMs provide reachability and avoidance guarantees by
imposing constraints on what can be viewed as a stochastic extension of level
sets of Lyapunov functions for deterministic systems. Our approach solves
several important problems -- it can be used to learn a control policy from
scratch, to verify a reach-avoid specification for a fixed control policy, or
to fine-tune a pre-trained policy if it does not satisfy the reach-avoid
specification. We validate our approach on $3$ stochastic non-linear
reinforcement learning tasks.
Related papers
- Compositional Policy Learning in Stochastic Control Systems with Formal
Guarantees [0.0]
Reinforcement learning has shown promising results in learning neural network policies for complicated control tasks.
We propose a novel method for learning a composition of neural network policies in environments.
A formal certificate guarantees that a specification over the policy's behavior is satisfied with the desired probability.
arXiv Detail & Related papers (2023-12-03T17:04:18Z) - Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers.
We then present the pointwise feasibility conditions of the resulting safety controller.
We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z) - Safe Reinforcement Learning via Confidence-Based Filters [78.39359694273575]
We develop a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard reinforcement learning techniques.
We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-07-04T11:43:23Z) - KCRL: Krasovskii-Constrained Reinforcement Learning with Guaranteed
Stability in Nonlinear Dynamical Systems [66.9461097311667]
We propose a model-based reinforcement learning framework with formal stability guarantees.
The proposed method learns the system dynamics up to a confidence interval using feature representation.
We show that KCRL is guaranteed to learn a stabilizing policy in a finite number of interactions with the underlying unknown system.
arXiv Detail & Related papers (2022-06-03T17:27:04Z) - Joint Differentiable Optimization and Verification for Certified
Reinforcement Learning [91.93635157885055]
In model-based reinforcement learning for safety-critical control systems, it is important to formally certify system properties.
We propose a framework that jointly conducts reinforcement learning and formal verification.
arXiv Detail & Related papers (2022-01-28T16:53:56Z) - Safety and Liveness Guarantees through Reach-Avoid Reinforcement
Learning [24.56889192688925]
Reach-avoid optimal control problems are central to safety and liveness assurance for autonomous robotic systems.
Recent successes in reinforcement learning methods to approximately solve optimal control problems with performance objectives make their application to certification problems attractive.
Recent work has shown promise in extending the reinforcement learning machinery to handle safety-type problems, whose objective is not a sum, but a minimum (or maximum) over time.
arXiv Detail & Related papers (2021-12-23T00:44:38Z) - On Imitation Learning of Linear Control Policies: Enforcing Stability
and Robustness Constraints via LMI Conditions [3.296303220677533]
We formulate the imitation learning of linear policies as a constrained optimization problem.
We show that one can guarantee the closed-loop stability and robustness by posing linear matrix inequality (LMI) constraints on the fitted policy.
arXiv Detail & Related papers (2021-03-24T02:43:03Z) - Closing the Closed-Loop Distribution Shift in Safe Imitation Learning [80.05727171757454]
We treat safe optimization-based control strategies as experts in an imitation learning problem.
We train a learned policy that can be cheaply evaluated at run-time and that provably satisfies the same safety guarantees as the expert.
arXiv Detail & Related papers (2021-02-18T05:11:41Z) - Improper Learning with Gradient-based Policy Optimization [62.50997487685586]
We consider an improper reinforcement learning setting where the learner is given M base controllers for an unknown Markov Decision Process.
We propose a gradient-based approach that operates over a class of improper mixtures of the controllers.
arXiv Detail & Related papers (2021-02-16T14:53:55Z) - Learning Constrained Adaptive Differentiable Predictive Control Policies
With Guarantees [1.1086440815804224]
We present differentiable predictive control (DPC), a method for learning constrained neural control policies for linear systems.
We employ automatic differentiation to obtain direct policy gradients by backpropagating the model predictive control (MPC) loss function and constraints penalties through a differentiable closed-loop system dynamics model.
arXiv Detail & Related papers (2020-04-23T14:24:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.