Related papers: Global Convergence of Policy Gradient Methods in Reinforcement Learning, Games and Control

Global Convergence of Policy Gradient Methods in Reinforcement Learning, Games and Control

URL: http://arxiv.org/abs/2310.05230v1
Date: Sun, 8 Oct 2023 16:54:25 GMT
Title: Global Convergence of Policy Gradient Methods in Reinforcement Learning, Games and Control
Authors: Shicong Cen, Yuejie Chi
Abstract summary: Policy gradient methods are increasingly popular for sequential decision making in reinforcement learning, games, and control. Guaranteeing the global optimality of policy gradient methods is highly nontrivial due to nonconcavity of the value functions.
Score: 38.10940311690513
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Policy gradient methods, where one searches for the policy of interest by maximizing the value functions using first-order information, become increasingly popular for sequential decision making in reinforcement learning, games, and control. Guaranteeing the global optimality of policy gradient methods, however, is highly nontrivial due to nonconcavity of the value functions. In this exposition, we highlight recent progresses in understanding and developing policy gradient methods with global convergence guarantees, putting an emphasis on their finite-time convergence rates with regard to salient problem parameters.

Related papers

Learning Deterministic Policies with Policy Gradients in Constrained Markov Decision Processes [59.27926064817273]
We introduce an exploration-agnostic algorithm, called C-PG, which enjoys global last-iterate convergence guarantees under domination assumptions.<n>We empirically validate both the action-based (C-PGAE) and parameter-based (C-PGPE) variants of C-PG on constrained control tasks.
arXiv Detail & Related papers (2025-06-06T10:29:05Z)
Ordering-based Conditions for Global Convergence of Policy Gradient Methods [73.6366483406033]
We prove that, for finite-arm bandits with linear function approximation, the global convergence of policy gradient (PG) methods depends on inter-related properties between the policy update and the representation. Overall, these observations call into question approximation error as an appropriate quantity for characterizing the global convergence of PG methods under linear function approximation.
arXiv Detail & Related papers (2025-04-02T21:06:28Z)
Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions. We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z)
Elementary Analysis of Policy Gradient Methods [3.468656086349638]
This paper focuses on the discounted MDP setting and conduct a systematic study of the aforementioned policy optimization methods. Several novel results are presented, including 1) global linear convergence of projected policy gradient for any constant step size, 2) sublinear convergence of softmax policy gradient for any constant step size, 3) global linear convergence of softmax natural policy gradient for any constant step size, 4) global linear convergence of entropy regularized softmax policy gradient for a wider range of constant step sizes than existing result, 5) tight local linear convergence rate of entropy regularized natural policy gradient, and 6) a new and concise local quadratic convergence rate of soft
arXiv Detail & Related papers (2024-04-04T11:16:16Z)
A Policy Gradient Framework for Stochastic Optimal Control Problems with Global Convergence Guarantee [12.884132885360907]
We consider policy gradient methods for optimal control problem in continuous time. We prove the global convergence of the gradient flow and establish a convergence rate under some regularity assumptions.
arXiv Detail & Related papers (2023-02-11T23:30:50Z)
Policy Gradient in Robust MDPs with Global Convergence Guarantee [13.40471012593073]
Robust Markov decision processes (RMDPs) provide a promising framework for computing reliable policies in the face of model errors. This paper proposes a new Double-Loop Robust Policy Gradient (DRPG), the first generic policy gradient method for RMDPs. In contrast with prior robust policy gradient algorithms, DRPG monotonically reduces approximation errors to guarantee convergence to a globally optimal policy.
arXiv Detail & Related papers (2022-12-20T17:14:14Z)
Bag of Tricks for Natural Policy Gradient Reinforcement Learning [87.54231228860495]
We have implemented and compared strategies that impact performance in natural policy gradient reinforcement learning. The proposed collection of strategies for performance optimization can improve results by 86% to 181% across the MuJuCo control benchmark.
arXiv Detail & Related papers (2022-01-22T17:44:19Z)
Global Convergence Using Policy Gradient Methods for Model-free Markovian Jump Linear Quadratic Control [8.98732207994362]
We study the global convergence of gradient-based policy optimization methods for control of discrete-time and model-free Markovian jump linear systems. We show global convergence of the policy using gradient descent and natural policy gradient methods.
arXiv Detail & Related papers (2021-11-30T09:26:26Z)
Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence [60.20076757208645]
This paper proposes a general policy mirror descent (GPMD) algorithm for solving regularized RL. We demonstrate that our algorithm converges linearly over an entire range learning rates, in a dimension-free fashion, to the global solution.
arXiv Detail & Related papers (2021-05-24T02:21:34Z)
Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization [44.24881971917951]
Natural policy gradient (NPG) methods are among the most widely used policy optimization algorithms. We develop convergence guarantees for entropy-regularized NPG methods under softmax parameterization. Our results accommodate a wide range of learning rates, and shed light upon the role of entropy regularization in enabling fast convergence.
arXiv Detail & Related papers (2020-07-13T17:58:41Z)
When Will Generative Adversarial Imitation Learning Algorithms Attain Global Convergence [56.40794592158596]
We study generative adversarial imitation learning (GAIL) under general MDP and for nonlinear reward function classes. This is the first systematic theoretical study of GAIL for global convergence.
arXiv Detail & Related papers (2020-06-24T06:24:37Z)
Stable Policy Optimization via Off-Policy Divergence Regularization [50.98542111236381]
Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) are among the most successful policy gradient approaches in deep reinforcement learning (RL) We propose a new algorithm which stabilizes the policy improvement through a proximity term that constrains the discounted state-action visitation distribution induced by consecutive policies to be close to one another. Our proposed method can have a beneficial effect on stability and improve final performance in benchmark high-dimensional control tasks.
arXiv Detail & Related papers (2020-03-09T13:05:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.