Learning Provably Stabilizing Neural Controllers for Discrete-Time
Stochastic Systems
- URL: http://arxiv.org/abs/2210.05304v2
- Date: Fri, 28 Jul 2023 08:00:05 GMT
- Title: Learning Provably Stabilizing Neural Controllers for Discrete-Time
Stochastic Systems
- Authors: Matin Ansaripour, Krishnendu Chatterjee, Thomas A. Henzinger, Mathias
Lechner, {\DJ}or{\dj}e \v{Z}ikeli\'c
- Abstract summary: We introduce the notion of stabilizing ranking supermartingales (sRSMs)
We show that our learning procedure can successfully learn provably stabilizing policies in practice.
- Score: 18.349820472823055
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We consider the problem of learning control policies in discrete-time
stochastic systems which guarantee that the system stabilizes within some
specified stabilization region with probability~$1$. Our approach is based on
the novel notion of stabilizing ranking supermartingales (sRSMs) that we
introduce in this work. Our sRSMs overcome the limitation of methods proposed
in previous works whose applicability is restricted to systems in which the
stabilizing region cannot be left once entered under any control policy. We
present a learning procedure that learns a control policy together with an sRSM
that formally certifies probability~$1$ stability, both learned as neural
networks. We show that this procedure can also be adapted to formally verifying
that, under a given Lipschitz continuous control policy, the stochastic system
stabilizes within some stabilizing region with probability~$1$. Our
experimental evaluation shows that our learning procedure can successfully
learn provably stabilizing policies in practice.
Related papers
- Globally Stable Neural Imitation Policies [3.8772936189143445]
We introduce the Stable Neural Dynamical System (SNDS), an imitation learning regime which produces a policy with formal stability guarantees.
We deploy a neural policy architecture that facilitates the representation of stability based on Lyapunov theorem.
We successfully deploy the trained policies on a real-world manipulator arm.
arXiv Detail & Related papers (2024-03-07T00:20:11Z) - Actor-Critic based Improper Reinforcement Learning [61.430513757337486]
We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process.
We propose two algorithms: (1) a Policy Gradient-based approach; and (2) an algorithm that can switch between a simple Actor-Critic scheme and a Natural Actor-Critic scheme.
arXiv Detail & Related papers (2022-07-19T05:55:02Z) - Thompson Sampling Achieves $\tilde O(\sqrt{T})$ Regret in Linear
Quadratic Control [85.22735611954694]
We study the problem of adaptive control of stabilizable linear-quadratic regulators (LQRs) using Thompson Sampling (TS)
We propose an efficient TS algorithm for the adaptive control of LQRs, TSAC, that attains $tilde O(sqrtT)$ regret, even for multidimensional systems.
arXiv Detail & Related papers (2022-06-17T02:47:53Z) - KCRL: Krasovskii-Constrained Reinforcement Learning with Guaranteed
Stability in Nonlinear Dynamical Systems [66.9461097311667]
We propose a model-based reinforcement learning framework with formal stability guarantees.
The proposed method learns the system dynamics up to a confidence interval using feature representation.
We show that KCRL is guaranteed to learn a stabilizing policy in a finite number of interactions with the underlying unknown system.
arXiv Detail & Related papers (2022-06-03T17:27:04Z) - Learning Stabilizing Policies in Stochastic Control Systems [20.045860624444494]
We study the effectiveness of jointly learning a policy together with a martingale certificate that proves its stability using a single learning algorithm.
Our results suggest that some form of pre-training of the policy is required for the joint optimization to repair and verify the policy successfully.
arXiv Detail & Related papers (2022-05-24T11:38:22Z) - Stability Verification in Stochastic Control Systems via Neural Network
Supermartingales [17.558766911646263]
We present an approach for general nonlinear control problems with two novel aspects.
We use ranking supergales (RSMs) to certify a.s.asymptotic stability, and we present a method for learning neural networks.
arXiv Detail & Related papers (2021-12-17T13:05:14Z) - Stabilizing Dynamical Systems via Policy Gradient Methods [32.88312419270879]
We provide a model-free algorithm for stabilizing fully observed dynamical systems.
We prove that this method efficiently recovers a stabilizing controller for linear systems.
We empirically evaluate the effectiveness of our approach on common control benchmarks.
arXiv Detail & Related papers (2021-10-13T00:58:57Z) - Closing the Closed-Loop Distribution Shift in Safe Imitation Learning [80.05727171757454]
We treat safe optimization-based control strategies as experts in an imitation learning problem.
We train a learned policy that can be cheaply evaluated at run-time and that provably satisfies the same safety guarantees as the expert.
arXiv Detail & Related papers (2021-02-18T05:11:41Z) - Improper Learning with Gradient-based Policy Optimization [62.50997487685586]
We consider an improper reinforcement learning setting where the learner is given M base controllers for an unknown Markov Decision Process.
We propose a gradient-based approach that operates over a class of improper mixtures of the controllers.
arXiv Detail & Related papers (2021-02-16T14:53:55Z) - Learning Stabilizing Controllers for Unstable Linear Quadratic
Regulators from a Single Trajectory [85.29718245299341]
We study linear controllers under quadratic costs model also known as linear quadratic regulators (LQR)
We present two different semi-definite programs (SDP) which results in a controller that stabilizes all systems within an ellipsoid uncertainty set.
We propose an efficient data dependent algorithm -- textsceXploration -- that with high probability quickly identifies a stabilizing controller.
arXiv Detail & Related papers (2020-06-19T08:58:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.