Certifying Stability of Reinforcement Learning Policies using Generalized Lyapunov Functions
- URL: http://arxiv.org/abs/2505.10947v2
- Date: Mon, 19 May 2025 17:11:49 GMT
- Title: Certifying Stability of Reinforcement Learning Policies using Generalized Lyapunov Functions
- Authors: Kehan Long, Jorge Cortés, Nikolay Atanasov,
- Abstract summary: We study the problem of certifying the stability of closed-loop systems under control policies derived from optimal control or reinforcement learning (RL)<n>Classical Lyapunov methods require a strict step-wise decrease in the Lyapunov function but such a certificate is difficult to construct for a learned control policy.<n>We formulate an approach to learn generalized Lyapunov functions by augmenting RL value functions with neural network residual terms.
- Score: 15.306107403623075
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the problem of certifying the stability of closed-loop systems under control policies derived from optimal control or reinforcement learning (RL). Classical Lyapunov methods require a strict step-wise decrease in the Lyapunov function but such a certificate is difficult to construct for a learned control policy. The value function associated with an RL policy is a natural Lyapunov function candidate but it is not clear how it should be modified. To gain intuition, we first study the linear quadratic regulator (LQR) problem and make two key observations. First, a Lyapunov function can be obtained from the value function of an LQR policy by augmenting it with a residual term related to the system dynamics and stage cost. Second, the classical Lyapunov decrease requirement can be relaxed to a generalized Lyapunov condition requiring only decrease on average over multiple time steps. Using this intuition, we consider the nonlinear setting and formulate an approach to learn generalized Lyapunov functions by augmenting RL value functions with neural network residual terms. Our approach successfully certifies the stability of RL policies trained on Gymnasium and DeepMind Control benchmarks. We also extend our method to jointly train neural controllers and stability certificates using a multi-step Lyapunov loss, resulting in larger certified inner approximations of the region of attraction compared to the classical Lyapunov approach. Overall, our formulation enables stability certification for a broad class of systems with learned policies by making certificates easier to construct, thereby bridging classical control theory and modern learning-based methods.
Related papers
- A Test-Function Approach to Incremental Stability [33.44344966171865]
The regularity of value functions, and their connection to incremental stability, can be understood in a way that is distinct from the traditional Lyapunov-based approach to certifying stability in control theory.
arXiv Detail & Related papers (2025-07-01T11:46:52Z) - Analytical Lyapunov Function Discovery: An RL-based Generative Approach [6.752429418580116]
We propose an end-to-end framework using transformers to construct analytical Lyapunov functions (local)<n>Our framework consists of a transformer-based trainer that generates candidate Lyapunov functions and a falsifier that verifies candidate expressions.<n>We show that our approach can discover Lyapunov functions not previously identified in the control literature.
arXiv Detail & Related papers (2025-02-04T05:04:15Z) - Lyapunov-stable Neural Control for State and Output Feedback: A Novel Formulation [67.63756749551924]
Learning-based neural network (NN) control policies have shown impressive empirical performance in a wide range of tasks in robotics and control.
Lyapunov stability guarantees over the region-of-attraction (ROA) for NN controllers with nonlinear dynamical systems are challenging to obtain.
We demonstrate a new framework for learning NN controllers together with Lyapunov certificates using fast empirical falsification and strategic regularizations.
arXiv Detail & Related papers (2024-04-11T17:49:15Z) - Neural Lyapunov Control for Discrete-Time Systems [30.135651803114307]
A general approach is to compute a combination of a Lyapunov function and an associated control policy.
Several methods have been proposed that represent Lyapunov functions using neural networks.
We propose the first approach for learning neural Lyapunov control in a broad class of discrete-time systems.
arXiv Detail & Related papers (2023-05-11T03:28:20Z) - Policy Gradient for Reinforcement Learning with General Utilities [50.65940899590487]
In Reinforcement Learning (RL), the goal of agents is to discover an optimal policy that maximizes the expected cumulative rewards.
Many supervised and unsupervised RL problems are not covered in the Linear RL framework.
We derive the policy gradient theorem for RL with general utilities.
arXiv Detail & Related papers (2022-10-03T14:57:46Z) - Neural Lyapunov Differentiable Predictive Control [2.042924346801313]
We present a learning-based predictive control methodology using the differentiable programming framework with probabilistic Lyapunov-based stability guarantees.
In conjunction, our approach jointly learns a Lyapunov function that certifies the regions of state-space with stable dynamics.
arXiv Detail & Related papers (2022-05-22T03:52:27Z) - Bellman Residual Orthogonalization for Offline Reinforcement Learning [53.17258888552998]
We introduce a new reinforcement learning principle that approximates the Bellman equations by enforcing their validity only along a test function space.
We exploit this principle to derive confidence intervals for off-policy evaluation, as well as to optimize over policies within a prescribed policy class.
arXiv Detail & Related papers (2022-03-24T01:04:17Z) - Joint Differentiable Optimization and Verification for Certified
Reinforcement Learning [91.93635157885055]
In model-based reinforcement learning for safety-critical control systems, it is important to formally certify system properties.
We propose a framework that jointly conducts reinforcement learning and formal verification.
arXiv Detail & Related papers (2022-01-28T16:53:56Z) - Policy Mirror Descent for Regularized Reinforcement Learning: A
Generalized Framework with Linear Convergence [60.20076757208645]
This paper proposes a general policy mirror descent (GPMD) algorithm for solving regularized RL.
We demonstrate that our algorithm converges linearly over an entire range learning rates, in a dimension-free fashion, to the global solution.
arXiv Detail & Related papers (2021-05-24T02:21:34Z) - Lyapunov-Regularized Reinforcement Learning for Power System Transient
Stability [5.634825161148484]
This paper proposes a Lyapunov regularized RL approach for optimal frequency control for transient stability in lossy networks.
Case study shows that introducing the Lyapunov regularization enables the controller to be stabilizing and achieve smaller losses.
arXiv Detail & Related papers (2021-03-05T18:55:26Z) - Closing the Closed-Loop Distribution Shift in Safe Imitation Learning [80.05727171757454]
We treat safe optimization-based control strategies as experts in an imitation learning problem.
We train a learned policy that can be cheaply evaluated at run-time and that provably satisfies the same safety guarantees as the expert.
arXiv Detail & Related papers (2021-02-18T05:11:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.