Learning Stabilizing Policies in Stochastic Control Systems
        - URL: http://arxiv.org/abs/2205.11991v1
- Date: Tue, 24 May 2022 11:38:22 GMT
- Title: Learning Stabilizing Policies in Stochastic Control Systems
- Authors: {\DJ}or{\dj}e \v{Z}ikeli\'c, Mathias Lechner, Krishnendu Chatterjee,
  Thomas A. Henzinger
- Abstract summary: We study the effectiveness of jointly learning a policy together with a martingale certificate that proves its stability using a single learning algorithm.
Our results suggest that some form of pre-training of the policy is required for the joint optimization to repair and verify the policy successfully.
- Score: 20.045860624444494
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   In this work, we address the problem of learning provably stable neural
network policies for stochastic control systems. While recent work has
demonstrated the feasibility of certifying given policies using martingale
theory, the problem of how to learn such policies is little explored. Here, we
study the effectiveness of jointly learning a policy together with a martingale
certificate that proves its stability using a single learning algorithm. We
observe that the joint optimization problem becomes easily stuck in local
minima when starting from a randomly initialized policy. Our results suggest
that some form of pre-training of the policy is required for the joint
optimization to repair and verify the policy successfully.
 
      
        Related papers
        - Convergence and Sample Complexity of First-Order Methods for Agnostic   Reinforcement Learning [66.4260157478436]
 We study reinforcement learning in the policy learning setting.<n>The goal is to find a policy whose performance is competitive with the best policy in a given class of interest.
 arXiv  Detail & Related papers  (2025-07-06T14:40:05Z)
- Learning Verifiable Control Policies Using Relaxed Verification [49.81690518952909]
 This work proposes to perform verification throughout training to aim for policies whose properties can be evaluated throughout runtime.
The approach is to use differentiable reachability analysis and incorporate new components into the loss function.
 arXiv  Detail & Related papers  (2025-04-23T16:54:35Z)
- Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
 Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
 arXiv  Detail & Related papers  (2024-05-03T16:45:15Z)
- Globally Stable Neural Imitation Policies [3.8772936189143445]
 We introduce the Stable Neural Dynamical System (SNDS), an imitation learning regime which produces a policy with formal stability guarantees.
We deploy a neural policy architecture that facilitates the representation of stability based on Lyapunov theorem.
We successfully deploy the trained policies on a real-world manipulator arm.
 arXiv  Detail & Related papers  (2024-03-07T00:20:11Z)
- Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline
  Reinforcement Learning [57.83919813698673]
 Projected Off-Policy Q-Learning (POP-QL) is a novel actor-critic algorithm that simultaneously reweights off-policy samples and constrains the policy to prevent divergence and reduce value-approximation error.
In our experiments, POP-QL not only shows competitive performance on standard benchmarks, but also out-performs competing methods in tasks where the data-collection policy is significantly sub-optimal.
 arXiv  Detail & Related papers  (2023-11-25T00:30:58Z)
- Learning Provably Stabilizing Neural Controllers for Discrete-Time
  Stochastic Systems [18.349820472823055]
 We introduce the notion of stabilizing ranking supermartingales (sRSMs)
We show that our learning procedure can successfully learn provably stabilizing policies in practice.
 arXiv  Detail & Related papers  (2022-10-11T09:55:07Z)
- A Regularized Implicit Policy for Offline Reinforcement Learning [54.7427227775581]
 offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment.
We propose a framework that supports learning a flexible yet well-regularized fully-implicit policy.
Experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
 arXiv  Detail & Related papers  (2022-02-19T20:22:04Z)
- Understanding Curriculum Learning in Policy Optimization for Online
  Combinatorial Optimization [66.35750142827898]
 This paper presents the first systematic study on policy optimization methods for online CO problems.
We show that online CO problems can be naturally formulated as latent Markov Decision Processes (LMDPs), and prove convergence bounds on natural policy gradient (NPG)
 Furthermore, our theory explains the benefit of curriculum learning: it can find a strong sampling policy and reduce the distribution shift.
 arXiv  Detail & Related papers  (2022-02-11T03:17:15Z)
- Joint Differentiable Optimization and Verification for Certified
  Reinforcement Learning [91.93635157885055]
 In model-based reinforcement learning for safety-critical control systems, it is important to formally certify system properties.
We propose a framework that jointly conducts reinforcement learning and formal verification.
 arXiv  Detail & Related papers  (2022-01-28T16:53:56Z)
- Cautious Policy Programming: Exploiting KL Regularization in Monotonic
  Policy Improvement for Reinforcement Learning [11.82492300303637]
 We propose a novel value-based reinforcement learning (RL) algorithm that can ensure monotonic policy improvement during learning.
We demonstrate that the proposed algorithm can trade o? performance and stability in both didactic classic control problems and challenging high-dimensional Atari games.
 arXiv  Detail & Related papers  (2021-07-13T01:03:10Z)
- On Imitation Learning of Linear Control Policies: Enforcing Stability
  and Robustness Constraints via LMI Conditions [3.296303220677533]
 We formulate the imitation learning of linear policies as a constrained optimization problem.
We show that one can guarantee the closed-loop stability and robustness by posing linear matrix inequality (LMI) constraints on the fitted policy.
 arXiv  Detail & Related papers  (2021-03-24T02:43:03Z)
- Closing the Closed-Loop Distribution Shift in Safe Imitation Learning [80.05727171757454]
 We treat safe optimization-based control strategies as experts in an imitation learning problem.
We train a learned policy that can be cheaply evaluated at run-time and that provably satisfies the same safety guarantees as the expert.
 arXiv  Detail & Related papers  (2021-02-18T05:11:41Z)
- Runtime-Safety-Guided Policy Repair [13.038017178545728]
 We study the problem of policy repair for learning-based control policies in safety-critical settings.
We propose to reduce or even eliminate control switching by repairing' the trained policy based on runtime data produced by the safety controller.
 arXiv  Detail & Related papers  (2020-08-17T23:31:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.