A comment on stabilizing reinforcement learning
- URL: http://arxiv.org/abs/2111.12316v1
- Date: Wed, 24 Nov 2021 07:58:14 GMT
- Title: A comment on stabilizing reinforcement learning
- Authors: Pavel Osinenko, Georgiy Malaniya, Grigory Yaremenko, Ilya Osokin
- Abstract summary: We argue that Vamvoudakis et al. made a fallacious assumption on the Hamiltonian under a generic policy.
We show a neural network convergence under a continuous-weight-time environment, provided certain conditions on the behavior policy hold.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This is a short comment on the paper "Asymptotically Stable Adaptive-Optimal
Control Algorithm With Saturating Actuators and Relaxed Persistence of
Excitation" by Vamvoudakis et al. The question of stability of reinforcement
learning (RL) agents remains hard and the said work suggested an on-policy
approach with a suitable stability property using a technique from adaptive
control - a robustifying term to be added to the action. However, there is an
issue with this approach to stabilizing RL, which we will explain in this note.
Furthermore, Vamvoudakis et al. seems to have made a fallacious assumption on
the Hamiltonian under a generic policy. To provide a positive result, we will
not only indicate this mistake, but show critic neural network weight
convergence under a stochastic, continuous-time environment, provided certain
conditions on the behavior policy hold.
Related papers
- MAD: A Magnitude And Direction Policy Parametrization for Stability Constrained Reinforcement Learning [1.712670816823812]
We introduce magnitude and direction (MAD) policies, a policy parameterization for reinforcement learning (RL)
MAD policies introduce explicit feedback on state-dependent features without compromising closed-loop stability.
We show that MAD policies trained with deep deterministic policy gradient (DDPG) methods generalize to unseen scenarios.
arXiv Detail & Related papers (2025-04-03T13:26:26Z) - Distributionally Robust Policy and Lyapunov-Certificate Learning [13.38077406934971]
Key challenge in designing controllers with stability guarantees for uncertain systems is the accurate determination of and adaptation to shifts in model parametric uncertainty during online deployment.
We tackle this with a novel distributionally robust formulation of the Lyapunov derivative chance constraint ensuring a monotonic decrease of the Lyapunov certificate.
We show that, for the resulting closed-loop system, the global stability of its equilibrium can be certified with high confidence, even with Out-of-Distribution uncertainties.
arXiv Detail & Related papers (2024-04-03T18:57:54Z) - Resilient Constrained Reinforcement Learning [87.4374430686956]
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before study.
It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward training objective and the constraint satisfaction.
We propose a new constrained RL approach that searches for policy and constraint specifications together.
arXiv Detail & Related papers (2023-12-28T18:28:23Z) - Synthesizing Stable Reduced-Order Visuomotor Policies for Nonlinear
Systems via Sums-of-Squares Optimization [28.627377507894003]
We present a method for noise-feedback, reduced-order output-of-control-perception policies for control observations of nonlinear systems.
We show that when these systems from images can fail to reliably stabilize, our approach can provide stability guarantees.
arXiv Detail & Related papers (2023-04-24T19:34:09Z) - Hallucinated Adversarial Control for Conservative Offline Policy
Evaluation [64.94009515033984]
We study the problem of conservative off-policy evaluation (COPE) where given an offline dataset of environment interactions, we seek to obtain a (tight) lower bound on a policy's performance.
We introduce HAMBO, which builds on an uncertainty-aware learned model of the transition dynamics.
We prove that the resulting COPE estimates are valid lower bounds, and, under regularity conditions, show their convergence to the true expected return.
arXiv Detail & Related papers (2023-03-02T08:57:35Z) - Bounded Robustness in Reinforcement Learning via Lexicographic
Objectives [54.00072722686121]
Policy robustness in Reinforcement Learning may not be desirable at any cost.
We study how policies can be maximally robust to arbitrary observational noise.
We propose a robustness-inducing scheme, applicable to any policy algorithm, that trades off expected policy utility for robustness.
arXiv Detail & Related papers (2022-09-30T08:53:18Z) - KCRL: Krasovskii-Constrained Reinforcement Learning with Guaranteed
Stability in Nonlinear Dynamical Systems [66.9461097311667]
We propose a model-based reinforcement learning framework with formal stability guarantees.
The proposed method learns the system dynamics up to a confidence interval using feature representation.
We show that KCRL is guaranteed to learn a stabilizing policy in a finite number of interactions with the underlying unknown system.
arXiv Detail & Related papers (2022-06-03T17:27:04Z) - Training Generative Adversarial Networks by Solving Ordinary
Differential Equations [54.23691425062034]
We study the continuous-time dynamics induced by GAN training.
From this perspective, we hypothesise that instabilities in training GANs arise from the integration error.
We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training.
arXiv Detail & Related papers (2020-10-28T15:23:49Z) - Ensuring Monotonic Policy Improvement in Entropy-regularized Value-based
Reinforcement Learning [14.325835899564664]
entropy-regularized value-based reinforcement learning method can ensure the monotonic improvement of policies at each policy update.
We propose a novel reinforcement learning algorithm that exploits this lower-bound as a criterion for adjusting the degree of a policy update for alleviating policy oscillation.
arXiv Detail & Related papers (2020-08-25T04:09:18Z) - Fine-Grained Analysis of Stability and Generalization for Stochastic
Gradient Descent [55.85456985750134]
We introduce a new stability measure called on-average model stability, for which we develop novel bounds controlled by the risks of SGD iterates.
This yields generalization bounds depending on the behavior of the best model, and leads to the first-ever-known fast bounds in the low-noise setting.
To our best knowledge, this gives the firstever-known stability and generalization for SGD with even non-differentiable loss functions.
arXiv Detail & Related papers (2020-06-15T06:30:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.