Global Convergence of Policy Gradient Methods in Reinforcement Learning,
Games and Control
- URL: http://arxiv.org/abs/2310.05230v1
- Date: Sun, 8 Oct 2023 16:54:25 GMT
- Title: Global Convergence of Policy Gradient Methods in Reinforcement Learning,
Games and Control
- Authors: Shicong Cen, Yuejie Chi
- Abstract summary: Policy gradient methods are increasingly popular for sequential decision making in reinforcement learning, games, and control.
Guaranteeing the global optimality of policy gradient methods is highly nontrivial due to nonconcavity of the value functions.
- Score: 38.10940311690513
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Policy gradient methods, where one searches for the policy of interest by
maximizing the value functions using first-order information, become
increasingly popular for sequential decision making in reinforcement learning,
games, and control. Guaranteeing the global optimality of policy gradient
methods, however, is highly nontrivial due to nonconcavity of the value
functions. In this exposition, we highlight recent progresses in understanding
and developing policy gradient methods with global convergence guarantees,
putting an emphasis on their finite-time convergence rates with regard to
salient problem parameters.
Related papers
- Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions.
We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z) - Elementary Analysis of Policy Gradient Methods [3.468656086349638]
This paper focuses on the discounted MDP setting and conduct a systematic study of the aforementioned policy optimization methods.
Several novel results are presented, including 1) global linear convergence of projected policy gradient for any constant step size, 2) sublinear convergence of softmax policy gradient for any constant step size, 3) global linear convergence of softmax natural policy gradient for any constant step size, 4) global linear convergence of entropy regularized softmax policy gradient for a wider range of constant step sizes than existing result, 5) tight local linear convergence rate of entropy regularized natural policy gradient, and 6) a new and concise local quadratic convergence rate of soft
arXiv Detail & Related papers (2024-04-04T11:16:16Z) - A Policy Gradient Framework for Stochastic Optimal Control Problems with
Global Convergence Guarantee [12.884132885360907]
We consider policy gradient methods for optimal control problem in continuous time.
We prove the global convergence of the gradient flow and establish a convergence rate under some regularity assumptions.
arXiv Detail & Related papers (2023-02-11T23:30:50Z) - Policy Gradient in Robust MDPs with Global Convergence Guarantee [13.40471012593073]
Robust Markov decision processes (RMDPs) provide a promising framework for computing reliable policies in the face of model errors.
This paper proposes a new Double-Loop Robust Policy Gradient (DRPG), the first generic policy gradient method for RMDPs.
In contrast with prior robust policy gradient algorithms, DRPG monotonically reduces approximation errors to guarantee convergence to a globally optimal policy.
arXiv Detail & Related papers (2022-12-20T17:14:14Z) - Bag of Tricks for Natural Policy Gradient Reinforcement Learning [87.54231228860495]
We have implemented and compared strategies that impact performance in natural policy gradient reinforcement learning.
The proposed collection of strategies for performance optimization can improve results by 86% to 181% across the MuJuCo control benchmark.
arXiv Detail & Related papers (2022-01-22T17:44:19Z) - Global Convergence Using Policy Gradient Methods for Model-free
Markovian Jump Linear Quadratic Control [8.98732207994362]
We study the global convergence of gradient-based policy optimization methods for control of discrete-time and model-free Markovian jump linear systems.
We show global convergence of the policy using gradient descent and natural policy gradient methods.
arXiv Detail & Related papers (2021-11-30T09:26:26Z) - Policy Mirror Descent for Regularized Reinforcement Learning: A
Generalized Framework with Linear Convergence [60.20076757208645]
This paper proposes a general policy mirror descent (GPMD) algorithm for solving regularized RL.
We demonstrate that our algorithm converges linearly over an entire range learning rates, in a dimension-free fashion, to the global solution.
arXiv Detail & Related papers (2021-05-24T02:21:34Z) - Fast Global Convergence of Natural Policy Gradient Methods with Entropy
Regularization [44.24881971917951]
Natural policy gradient (NPG) methods are among the most widely used policy optimization algorithms.
We develop convergence guarantees for entropy-regularized NPG methods under softmax parameterization.
Our results accommodate a wide range of learning rates, and shed light upon the role of entropy regularization in enabling fast convergence.
arXiv Detail & Related papers (2020-07-13T17:58:41Z) - When Will Generative Adversarial Imitation Learning Algorithms Attain
Global Convergence [56.40794592158596]
We study generative adversarial imitation learning (GAIL) under general MDP and for nonlinear reward function classes.
This is the first systematic theoretical study of GAIL for global convergence.
arXiv Detail & Related papers (2020-06-24T06:24:37Z) - Stable Policy Optimization via Off-Policy Divergence Regularization [50.98542111236381]
Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) are among the most successful policy gradient approaches in deep reinforcement learning (RL)
We propose a new algorithm which stabilizes the policy improvement through a proximity term that constrains the discounted state-action visitation distribution induced by consecutive policies to be close to one another.
Our proposed method can have a beneficial effect on stability and improve final performance in benchmark high-dimensional control tasks.
arXiv Detail & Related papers (2020-03-09T13:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.