Reinforcement Learning with Fast Stabilization in Linear Dynamical
Systems
- URL: http://arxiv.org/abs/2007.12291v2
- Date: Fri, 3 Jun 2022 17:50:49 GMT
- Title: Reinforcement Learning with Fast Stabilization in Linear Dynamical
Systems
- Authors: Sahin Lale, Kamyar Azizzadenesheli, Babak Hassibi, Anima Anandkumar
- Abstract summary: We study model-based reinforcement learning (RL) in unknown stabilizable linear dynamical systems.
We propose an algorithm that certifies fast stabilization of the underlying system by effectively exploring the environment.
We show that the proposed algorithm attains $tildemathcalO(sqrtT)$ regret after $T$ time steps of agent-environment interaction.
- Score: 91.43582419264763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we study model-based reinforcement learning (RL) in unknown
stabilizable linear dynamical systems. When learning a dynamical system, one
needs to stabilize the unknown dynamics in order to avoid system blow-ups. We
propose an algorithm that certifies fast stabilization of the underlying system
by effectively exploring the environment with an improved exploration strategy.
We show that the proposed algorithm attains $\tilde{\mathcal{O}}(\sqrt{T})$
regret after $T$ time steps of agent-environment interaction. We also show that
the regret of the proposed algorithm has only a polynomial dependence in the
problem dimensions, which gives an exponential improvement over the prior
methods. Our improved exploration method is simple, yet efficient, and it
combines a sophisticated exploration policy in RL with an isotropic exploration
strategy to achieve fast stabilization and improved regret. We empirically
demonstrate that the proposed algorithm outperforms other popular methods in
several adaptive control tasks.
Related papers
- Stochastic Reinforcement Learning with Stability Guarantees for Control of Unknown Nonlinear Systems [6.571209126567701]
We propose a reinforcement learning algorithm that stabilizes the system by learning a local linear representation ofthe dynamics.
We demonstrate the effectiveness of our algorithm on several challenging high-dimensional dynamical systems.
arXiv Detail & Related papers (2024-09-12T20:07:54Z) - Guaranteed Conservation of Momentum for Learning Particle-based Fluid
Dynamics [96.9177297872723]
We present a novel method for guaranteeing linear momentum in learned physics simulations.
We enforce conservation of momentum with a hard constraint, which we realize via antisymmetrical continuous convolutional layers.
In combination, the proposed method allows us to increase the physical accuracy of the learned simulator substantially.
arXiv Detail & Related papers (2022-10-12T09:12:59Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Learning to Control under Time-Varying Environment [18.48729114775298]
This paper investigates the problem of regret in linear time-varying (LTV) dynamical systems.
We propose the first computationally tractable online algorithm with regret guarantees.
arXiv Detail & Related papers (2022-06-06T11:40:46Z) - Stabilizing Dynamical Systems via Policy Gradient Methods [32.88312419270879]
We provide a model-free algorithm for stabilizing fully observed dynamical systems.
We prove that this method efficiently recovers a stabilizing controller for linear systems.
We empirically evaluate the effectiveness of our approach on common control benchmarks.
arXiv Detail & Related papers (2021-10-13T00:58:57Z) - Imitation Learning of Stabilizing Policies for Nonlinear Systems [1.52292571922932]
It is shown that the methods developed for linear systems and controllers can be readily extended to controllers using sum of squares.
A projected gradient descent algorithm and an alternating direction method of algorithm are proposed ass for the stabilizing imitation learning problem.
arXiv Detail & Related papers (2021-09-22T17:27:19Z) - Reinforcement Learning Policies in Continuous-Time Linear Systems [0.0]
We present online policies that learn optimal actions fast by carefully randomizing the parameter estimates.
We prove sharp stability results for inexact system dynamics and tightly specify the infinitesimal regret caused by sub-optimal actions.
Our analysis sheds light on fundamental challenges in continuous-time reinforcement learning and suggests a useful cornerstone for similar problems.
arXiv Detail & Related papers (2021-09-16T00:08:50Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z) - Chance-Constrained Trajectory Optimization for Safe Exploration and
Learning of Nonlinear Systems [81.7983463275447]
Learning-based control algorithms require data collection with abundant supervision for training.
We present a new approach for optimal motion planning with safe exploration that integrates chance-constrained optimal control with dynamics learning and feedback control.
arXiv Detail & Related papers (2020-05-09T05:57:43Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.