Policy Optimization for Markovian Jump Linear Quadratic Control:
Gradient-Based Methods and Global Convergence
- URL: http://arxiv.org/abs/2011.11852v1
- Date: Tue, 24 Nov 2020 02:39:38 GMT
- Title: Policy Optimization for Markovian Jump Linear Quadratic Control:
Gradient-Based Methods and Global Convergence
- Authors: Joao Paulo Jansch-Porto, Bin Hu, Geir Dullerud
- Abstract summary: We show that three types of policy optimization methods converge to the optimal state controller for MJLS at a gradient if at a controller is- and at a jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump
- Score: 3.3343656101775365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, policy optimization for control purposes has received renewed
attention due to the increasing interest in reinforcement learning. In this
paper, we investigate the global convergence of gradient-based policy
optimization methods for quadratic optimal control of discrete-time Markovian
jump linear systems (MJLS). First, we study the optimization landscape of
direct policy optimization for MJLS, with static state feedback controllers and
quadratic performance costs. Despite the non-convexity of the resultant
problem, we are still able to identify several useful properties such as
coercivity, gradient dominance, and almost smoothness. Based on these
properties, we show global convergence of three types of policy optimization
methods: the gradient descent method; the Gauss-Newton method; and the natural
policy gradient method. We prove that all three methods converge to the optimal
state feedback controller for MJLS at a linear rate if initialized at a
controller which is mean-square stabilizing. Some numerical examples are
presented to support the theory. This work brings new insights for
understanding the performance of policy gradient methods on the Markovian jump
linear quadratic control problem.
Related papers
- Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Optimization Landscape of Policy Gradient Methods for Discrete-time
Static Output Feedback [22.21598324895312]
This paper analyzes the optimization landscape inherent to policy gradient methods when applied to static output feedback control.
We derive novel findings regarding convergence (and nearly dimension-free rate) to stationary points for three policy gradient methods.
We provide proof that the vanilla policy gradient method exhibits linear convergence towards local minima when near such minima.
arXiv Detail & Related papers (2023-10-29T14:25:57Z) - A Policy Gradient Framework for Stochastic Optimal Control Problems with
Global Convergence Guarantee [12.884132885360907]
We consider policy gradient methods for optimal control problem in continuous time.
We prove the global convergence of the gradient flow and establish a convergence rate under some regularity assumptions.
arXiv Detail & Related papers (2023-02-11T23:30:50Z) - Bag of Tricks for Natural Policy Gradient Reinforcement Learning [87.54231228860495]
We have implemented and compared strategies that impact performance in natural policy gradient reinforcement learning.
The proposed collection of strategies for performance optimization can improve results by 86% to 181% across the MuJuCo control benchmark.
arXiv Detail & Related papers (2022-01-22T17:44:19Z) - Global Convergence Using Policy Gradient Methods for Model-free
Markovian Jump Linear Quadratic Control [8.98732207994362]
We study the global convergence of gradient-based policy optimization methods for control of discrete-time and model-free Markovian jump linear systems.
We show global convergence of the policy using gradient descent and natural policy gradient methods.
arXiv Detail & Related papers (2021-11-30T09:26:26Z) - Softmax Policy Gradient Methods Can Take Exponential Time to Converge [60.98700344526674]
The softmax policy gradient (PG) method is arguably one of the de facto implementations of policy optimization in modern reinforcement learning.
We demonstrate that softmax PG methods can take exponential time -- in terms of $mathcalS|$ and $frac11-gamma$ -- to converge.
arXiv Detail & Related papers (2021-02-22T18:56:26Z) - Improper Learning with Gradient-based Policy Optimization [62.50997487685586]
We consider an improper reinforcement learning setting where the learner is given M base controllers for an unknown Markov Decision Process.
We propose a gradient-based approach that operates over a class of improper mixtures of the controllers.
arXiv Detail & Related papers (2021-02-16T14:53:55Z) - Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a
Finite Horizon [3.867363075280544]
We explore reinforcement learning methods for finding the optimal policy in the linear quadratic regulator (LQR) problem.
We produce a global linear convergence guarantee for the setting of finite time horizon and state dynamics under weak assumptions.
We show results for the case where we assume a model for the underlying dynamics and where we apply the method to the data directly.
arXiv Detail & Related papers (2020-11-20T09:51:49Z) - Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field
Control/Game in Continuous Time [109.06623773924737]
We study the policy gradient method for the linear-quadratic mean-field control and game.
We show that it converges to the optimal solution at a linear rate, which is verified by a synthetic simulation.
arXiv Detail & Related papers (2020-08-16T06:34:11Z) - Convergence Guarantees of Policy Optimization Methods for Markovian Jump
Linear Systems [3.3343656101775365]
We show that the Gauss-Newton method converges to the optimal state feedback controller for MJLS at a linear rate if at a controller which stabilizes the closed-loop dynamics in the mean sense.
We present an example to support our theory.
arXiv Detail & Related papers (2020-02-10T21:13:42Z) - On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization [80.03647903934723]
We prove adaptive gradient methods in expectation of gradient convergence methods.
Our analyses shed light on better adaptive gradient methods in optimizing non understanding gradient bounds.
arXiv Detail & Related papers (2018-08-16T20:25:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.