Off Policy Lyapunov Stability in Reinforcement Learning
- URL: http://arxiv.org/abs/2509.09863v1
- Date: Thu, 11 Sep 2025 21:34:08 GMT
- Title: Off Policy Lyapunov Stability in Reinforcement Learning
- Authors: Sarvan Gill, Daniela Constantinescu,
- Abstract summary: This paper introduces a method for learning Lyapunov functions off-policy.<n>It incorporates the proposed off-policy Lyapunov function into the Soft Actor Critic and Proximal Policy Optimization algorithms.
- Score: 1.9193579706947885
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Traditional reinforcement learning lacks the ability to provide stability guarantees. More recent algorithms learn Lyapunov functions alongside the control policies to ensure stable learning. However, the current self-learned Lyapunov functions are sample inefficient due to their on-policy nature. This paper introduces a method for learning Lyapunov functions off-policy and incorporates the proposed off-policy Lyapunov function into the Soft Actor Critic and Proximal Policy Optimization algorithms to provide them with a data efficient stability certificate. Simulations of an inverted pendulum and a quadrotor illustrate the improved performance of the two algorithms when endowed with the proposed off-policy Lyapunov function.
Related papers
- Relative Entropy Pathwise Policy Optimization [66.03329137921949]
We present an on-policy algorithm that trains Q-value models purely from on-policy trajectories.<n>We show how to combine policies for exploration with constrained updates for stable training, and evaluate important architectural components that stabilize value function learning.
arXiv Detail & Related papers (2025-07-15T06:24:07Z) - Certifying Stability of Reinforcement Learning Policies using Generalized Lyapunov Functions [15.306107403623075]
We study the problem of certifying the stability of closed-loop systems under control policies derived from optimal control or reinforcement learning (RL)<n>Classical Lyapunov methods require a strict step-wise decrease in the Lyapunov function but such a certificate is difficult to construct for a learned control policy.<n>We formulate an approach to learn generalized Lyapunov functions by augmenting RL value functions with neural network residual terms.
arXiv Detail & Related papers (2025-05-16T07:36:40Z) - Distillation Policy Optimization [5.439020425819001]
We introduce an actor-critic learning framework that harmonizes two data sources for both evaluation and control.
This framework incorporates variance reduction mechanisms, including a unified advantage estimator (UAE) and a residual baseline.
Our results showcase substantial enhancements in sample efficiency for on-policy algorithms, effectively bridging the gap to the off-policy approaches.
arXiv Detail & Related papers (2023-02-01T15:59:57Z) - KCRL: Krasovskii-Constrained Reinforcement Learning with Guaranteed
Stability in Nonlinear Dynamical Systems [66.9461097311667]
We propose a model-based reinforcement learning framework with formal stability guarantees.
The proposed method learns the system dynamics up to a confidence interval using feature representation.
We show that KCRL is guaranteed to learn a stabilizing policy in a finite number of interactions with the underlying unknown system.
arXiv Detail & Related papers (2022-06-03T17:27:04Z) - Bellman Residual Orthogonalization for Offline Reinforcement Learning [53.17258888552998]
We introduce a new reinforcement learning principle that approximates the Bellman equations by enforcing their validity only along a test function space.
We exploit this principle to derive confidence intervals for off-policy evaluation, as well as to optimize over policies within a prescribed policy class.
arXiv Detail & Related papers (2022-03-24T01:04:17Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - On Imitation Learning of Linear Control Policies: Enforcing Stability
and Robustness Constraints via LMI Conditions [3.296303220677533]
We formulate the imitation learning of linear policies as a constrained optimization problem.
We show that one can guarantee the closed-loop stability and robustness by posing linear matrix inequality (LMI) constraints on the fitted policy.
arXiv Detail & Related papers (2021-03-24T02:43:03Z) - Closing the Closed-Loop Distribution Shift in Safe Imitation Learning [80.05727171757454]
We treat safe optimization-based control strategies as experts in an imitation learning problem.
We train a learned policy that can be cheaply evaluated at run-time and that provably satisfies the same safety guarantees as the expert.
arXiv Detail & Related papers (2021-02-18T05:11:41Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - Variance-Reduced Off-Policy Memory-Efficient Policy Search [61.23789485979057]
Off-policy policy optimization is a challenging problem in reinforcement learning.
Off-policy algorithms are memory-efficient and capable of learning from off-policy samples.
arXiv Detail & Related papers (2020-09-14T16:22:46Z) - Stable and Efficient Policy Evaluation [31.04376768927044]
This paper introduces novel algorithms that are both off-policy stable and on-policy efficient by using the oblique projection method.
The empirical results on various domains validate the effectiveness of the proposed approach.
arXiv Detail & Related papers (2020-06-06T21:14:06Z) - Neural Lyapunov Redesign [36.2939747271983]
Learning controllers must guarantee some notion of safety to ensure that it does not harm either the agent or the environment.
Lyapunov functions are effective tools to assess stability in nonlinear dynamical systems.
We propose a two-player collaborative algorithm that alternates between estimating a Lyapunov function and deriving a controller that gradually enlarges the stability region.
arXiv Detail & Related papers (2020-06-06T19:22:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.