Linear Convergence of Natural Policy Gradient Methods with Log-Linear
Policies
- URL: http://arxiv.org/abs/2210.01400v1
- Date: Tue, 4 Oct 2022 06:17:52 GMT
- Title: Linear Convergence of Natural Policy Gradient Methods with Log-Linear
Policies
- Authors: Rui Yuan, Simon S. Du, Robert M. Gower, Alessandro Lazaric, Lin Xiao
- Abstract summary: We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class.
We show that both methods attain linear convergence rates and $mathcalO (1/epsilon2)$ sample complexities using a simple, non-adaptive geometrically increasing step size.
- Score: 115.86431674214282
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider infinite-horizon discounted Markov decision processes and study
the convergence rates of the natural policy gradient (NPG) and the Q-NPG
methods with the log-linear policy class. Using the compatible function
approximation framework, both methods with log-linear policies can be written
as approximate versions of the policy mirror descent (PMD) method. We show that
both methods attain linear convergence rates and $\mathcal{O}(1/\epsilon^2)$
sample complexities using a simple, non-adaptive geometrically increasing step
size, without resorting to entropy or other strongly convex regularization.
Lastly, as a byproduct, we obtain sublinear convergence rates for both methods
with arbitrary constant step size.
Related papers
- Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs [82.34567890576423]
We develop a deterministic policy gradient primal-dual method to find an optimal deterministic policy with non-asymptotic convergence.
We prove that the primal-dual iterates of D-PGPD converge at a sub-linear rate to an optimal regularized primal-dual pair.
To the best of our knowledge, this appears to be the first work that proposes a deterministic policy search method for continuous-space constrained MDPs.
arXiv Detail & Related papers (2024-08-19T14:11:04Z) - Last-Iterate Convergent Policy Gradient Primal-Dual Methods for
Constrained MDPs [107.28031292946774]
We study the problem of computing an optimal policy of an infinite-horizon discounted Markov decision process (constrained MDP)
We develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy.
To the best of our knowledge, this work appears to be the first non-asymptotic policy last-iterate convergence result for single-time-scale algorithms in constrained MDPs.
arXiv Detail & Related papers (2023-06-20T17:27:31Z) - High-probability sample complexities for policy evaluation with linear function approximation [88.87036653258977]
We investigate the sample complexities required to guarantee a predefined estimation error of the best linear coefficients for two widely-used policy evaluation algorithms.
We establish the first sample complexity bound with high-probability convergence guarantee that attains the optimal dependence on the tolerance level.
arXiv Detail & Related papers (2023-05-30T12:58:39Z) - Geometry and convergence of natural policy gradient methods [0.0]
We study the convergence of several natural policy gradient (NPG) methods in infinite-horizon discounted Markov decision processes with regular policy parametrizations.
For a variety of NPGs and reward functions we show that the trajectories in state-action space are solutions of gradient flows with respect to Hessian geometries.
arXiv Detail & Related papers (2022-11-03T19:16:15Z) - Quasi-Newton Iteration in Deterministic Policy Gradient [0.0]
We show that the approximate Hessian converges to the exact Hessian at the optimal policy.
We analytically verify the formulation in a simple linear case and compare the convergence of the proposed method with the natural policy gradient.
arXiv Detail & Related papers (2022-03-25T18:38:57Z) - On the Convergence Rates of Policy Gradient Methods [9.74841674275568]
We consider geometrically discounted dominance problems with finite state sub spaces.
We show that with direct gradient pararization in a sample we can analyze the general complexity of a gradient.
arXiv Detail & Related papers (2022-01-19T07:03:37Z) - Improving Metric Dimensionality Reduction with Distributed Topology [68.8204255655161]
DIPOLE is a dimensionality-reduction post-processing step that corrects an initial embedding by minimizing a loss functional with both a local, metric term and a global, topological term.
We observe that DIPOLE outperforms popular methods like UMAP, t-SNE, and Isomap on a number of popular datasets.
arXiv Detail & Related papers (2021-06-14T17:19:44Z) - Softmax Policy Gradient Methods Can Take Exponential Time to Converge [60.98700344526674]
The softmax policy gradient (PG) method is arguably one of the de facto implementations of policy optimization in modern reinforcement learning.
We demonstrate that softmax PG methods can take exponential time -- in terms of $mathcalS|$ and $frac11-gamma$ -- to converge.
arXiv Detail & Related papers (2021-02-22T18:56:26Z) - On Linear Convergence of Policy Gradient Methods for Finite MDPs [8.00114449574708]
We revisit the finite time analysis of policy gradient methods in the one of the simplest settings.
We show that many variants of policy gradient methods succeed with large step-sizes and attain a linear rate of convergence.
arXiv Detail & Related papers (2020-07-21T22:35:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.