Global Convergence Using Policy Gradient Methods for Model-free
Markovian Jump Linear Quadratic Control
- URL: http://arxiv.org/abs/2111.15228v1
- Date: Tue, 30 Nov 2021 09:26:26 GMT
- Title: Global Convergence Using Policy Gradient Methods for Model-free
Markovian Jump Linear Quadratic Control
- Authors: Santanu Rathod, Manoj Bhadu, Abir De
- Abstract summary: We study the global convergence of gradient-based policy optimization methods for control of discrete-time and model-free Markovian jump linear systems.
We show global convergence of the policy using gradient descent and natural policy gradient methods.
- Score: 8.98732207994362
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Owing to the growth of interest in Reinforcement Learning in the last few
years, gradient based policy control methods have been gaining popularity for
Control problems as well. And rightly so, since gradient policy methods have
the advantage of optimizing a metric of interest in an end-to-end manner, along
with being relatively easy to implement without complete knowledge of the
underlying system. In this paper, we study the global convergence of
gradient-based policy optimization methods for quadratic control of
discrete-time and model-free Markovian jump linear systems (MJLS). We surmount
myriad challenges that arise because of more than one states coupled with lack
of knowledge of the system dynamics and show global convergence of the policy
using gradient descent and natural policy gradient methods. We also provide
simulation studies to corroborate our claims.
Related papers
- Landscape of Policy Optimization for Finite Horizon MDPs with General State and Action [10.219627570276689]
We develop a framework for a class of Markov Decision Processes with general state and spaces.
We show that gradient methods converge to the globally optimal policy with a nonasymptomatic condition.
Our result establishes first complexity for multi-period inventory systems.
arXiv Detail & Related papers (2024-09-25T17:56:02Z) - Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions.
We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Optimization Landscape of Policy Gradient Methods for Discrete-time
Static Output Feedback [22.21598324895312]
This paper analyzes the optimization landscape inherent to policy gradient methods when applied to static output feedback control.
We derive novel findings regarding convergence (and nearly dimension-free rate) to stationary points for three policy gradient methods.
We provide proof that the vanilla policy gradient method exhibits linear convergence towards local minima when near such minima.
arXiv Detail & Related papers (2023-10-29T14:25:57Z) - Global Convergence of Policy Gradient Methods in Reinforcement Learning,
Games and Control [38.10940311690513]
Policy gradient methods are increasingly popular for sequential decision making in reinforcement learning, games, and control.
Guaranteeing the global optimality of policy gradient methods is highly nontrivial due to nonconcavity of the value functions.
arXiv Detail & Related papers (2023-10-08T16:54:25Z) - Enforcing the consensus between Trajectory Optimization and Policy
Learning for precise robot control [75.28441662678394]
Reinforcement learning (RL) and trajectory optimization (TO) present strong complementary advantages.
We propose several improvements on top of these approaches to learn global control policies quicker.
arXiv Detail & Related papers (2022-09-19T13:32:09Z) - Improper Learning with Gradient-based Policy Optimization [62.50997487685586]
We consider an improper reinforcement learning setting where the learner is given M base controllers for an unknown Markov Decision Process.
We propose a gradient-based approach that operates over a class of improper mixtures of the controllers.
arXiv Detail & Related papers (2021-02-16T14:53:55Z) - Policy Optimization for Markovian Jump Linear Quadratic Control:
Gradient-Based Methods and Global Convergence [3.3343656101775365]
We show that three types of policy optimization methods converge to the optimal state controller for MJLS at a gradient if at a controller is- and at a jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump
arXiv Detail & Related papers (2020-11-24T02:39:38Z) - Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a
Finite Horizon [3.867363075280544]
We explore reinforcement learning methods for finding the optimal policy in the linear quadratic regulator (LQR) problem.
We produce a global linear convergence guarantee for the setting of finite time horizon and state dynamics under weak assumptions.
We show results for the case where we assume a model for the underlying dynamics and where we apply the method to the data directly.
arXiv Detail & Related papers (2020-11-20T09:51:49Z) - Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field
Control/Game in Continuous Time [109.06623773924737]
We study the policy gradient method for the linear-quadratic mean-field control and game.
We show that it converges to the optimal solution at a linear rate, which is verified by a synthetic simulation.
arXiv Detail & Related papers (2020-08-16T06:34:11Z) - When Will Generative Adversarial Imitation Learning Algorithms Attain
Global Convergence [56.40794592158596]
We study generative adversarial imitation learning (GAIL) under general MDP and for nonlinear reward function classes.
This is the first systematic theoretical study of GAIL for global convergence.
arXiv Detail & Related papers (2020-06-24T06:24:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.