On the Convergence Rates of Policy Gradient Methods
- URL: http://arxiv.org/abs/2201.07443v1
- Date: Wed, 19 Jan 2022 07:03:37 GMT
- Title: On the Convergence Rates of Policy Gradient Methods
- Authors: Lin Xiao
- Abstract summary: We consider geometrically discounted dominance problems with finite state sub spaces.
We show that with direct gradient pararization in a sample we can analyze the general complexity of a gradient.
- Score: 9.74841674275568
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider infinite-horizon discounted Markov decision problems with finite
state and action spaces. We show that with direct parametrization in the policy
space, the weighted value function, although non-convex in general, is both
quasi-convex and quasi-concave. While quasi-convexity helps explain the
convergence of policy gradient methods to global optima, quasi-concavity hints
at their convergence guarantees using arbitrarily large step sizes that are not
dictated by the Lipschitz constant charactering smoothness of the value
function. In particular, we show that when using geometrically increasing step
sizes, a general class of policy mirror descent methods, including the natural
policy gradient method and a projected Q-descent method, all enjoy a linear
rate of convergence without relying on entropy or other strongly convex
regularization. In addition, we develop a theory of weak gradient-mapping
dominance and use it to prove sharper sublinear convergence rate of the
projected policy gradient method. Finally, we also analyze the convergence rate
of an inexact policy mirror descent method and estimate its sample complexity
under a simple generative model.
Related papers
- A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning.
This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z) - Elementary Analysis of Policy Gradient Methods [3.468656086349638]
This paper focuses on the discounted MDP setting and conduct a systematic study of the aforementioned policy optimization methods.
Several novel results are presented, including 1) global linear convergence of projected policy gradient for any constant step size, 2) sublinear convergence of softmax policy gradient for any constant step size, 3) global linear convergence of softmax natural policy gradient for any constant step size, 4) global linear convergence of entropy regularized softmax policy gradient for a wider range of constant step sizes than existing result, 5) tight local linear convergence rate of entropy regularized natural policy gradient, and 6) a new and concise local quadratic convergence rate of soft
arXiv Detail & Related papers (2024-04-04T11:16:16Z) - Last-Iterate Convergent Policy Gradient Primal-Dual Methods for
Constrained MDPs [107.28031292946774]
We study the problem of computing an optimal policy of an infinite-horizon discounted Markov decision process (constrained MDP)
We develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy.
To the best of our knowledge, this work appears to be the first non-asymptotic policy last-iterate convergence result for single-time-scale algorithms in constrained MDPs.
arXiv Detail & Related papers (2023-06-20T17:27:31Z) - Linear Convergence of Natural Policy Gradient Methods with Log-Linear
Policies [115.86431674214282]
We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class.
We show that both methods attain linear convergence rates and $mathcalO (1/epsilon2)$ sample complexities using a simple, non-adaptive geometrically increasing step size.
arXiv Detail & Related papers (2022-10-04T06:17:52Z) - Linear Convergence for Natural Policy Gradient with Log-linear Policy
Parametrization [18.072051868187934]
We analyze the convergence rate of the unregularized natural policy algorithm with log-linear policy parametrizations.
We show that the algorithm enjoys the same linear guarantees as in the deterministic case up to an error term.
arXiv Detail & Related papers (2022-09-30T11:17:44Z) - Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs [21.347689976296834]
We employ the natural policy gradient method to solve the discounted optimal optimal rate problem.
We also provide convergence and finite-sample guarantees for two sample-based NPG-PD algorithms.
arXiv Detail & Related papers (2022-06-06T04:28:04Z) - Beyond Exact Gradients: Convergence of Stochastic Soft-Max Policy Gradient Methods with Entropy Regularization [20.651913793555163]
We revisit the classical entropy regularized policy gradient methods with the soft-max policy parametrization.
We establish a global optimality convergence result and a sample complexity of $widetildemathcalO(frac1epsilon2)$ for the proposed algorithm.
arXiv Detail & Related papers (2021-10-19T17:21:09Z) - Softmax Policy Gradient Methods Can Take Exponential Time to Converge [60.98700344526674]
The softmax policy gradient (PG) method is arguably one of the de facto implementations of policy optimization in modern reinforcement learning.
We demonstrate that softmax PG methods can take exponential time -- in terms of $mathcalS|$ and $frac11-gamma$ -- to converge.
arXiv Detail & Related papers (2021-02-22T18:56:26Z) - Complexity Guarantees for Polyak Steps with Momentum [76.97851351276165]
We study a class of methods, based on Polyak steps, where this knowledge is substituted by that of the optimal value, $f_*$.
We first show slightly improved convergence bounds than previously known for the classical case of simple gradient descent with Polyak steps, we then derive an accelerated gradient method with Polyak steps and momentum, along with convergence guarantees.
arXiv Detail & Related papers (2020-02-03T17:50:28Z) - On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization [80.03647903934723]
We prove adaptive gradient methods in expectation of gradient convergence methods.
Our analyses shed light on better adaptive gradient methods in optimizing non understanding gradient bounds.
arXiv Detail & Related papers (2018-08-16T20:25:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.