Geometry and convergence of natural policy gradient methods
- URL: http://arxiv.org/abs/2211.02105v1
- Date: Thu, 3 Nov 2022 19:16:15 GMT
- Title: Geometry and convergence of natural policy gradient methods
- Authors: Johannes M\"uller and Guido Mont\'ufar
- Abstract summary: We study the convergence of several natural policy gradient (NPG) methods in infinite-horizon discounted Markov decision processes with regular policy parametrizations.
For a variety of NPGs and reward functions we show that the trajectories in state-action space are solutions of gradient flows with respect to Hessian geometries.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the convergence of several natural policy gradient (NPG) methods in
infinite-horizon discounted Markov decision processes with regular policy
parametrizations. For a variety of NPGs and reward functions we show that the
trajectories in state-action space are solutions of gradient flows with respect
to Hessian geometries, based on which we obtain global convergence guarantees
and convergence rates. In particular, we show linear convergence for
unregularized and regularized NPG flows with the metrics proposed by Kakade and
Morimura and co-authors by observing that these arise from the Hessian
geometries of conditional entropy and entropy respectively. Further, we obtain
sublinear convergence rates for Hessian geometries arising from other convex
functions like log-barriers. Finally, we interpret the discrete-time NPG
methods with regularized rewards as inexact Newton methods if the NPG is
defined with respect to the Hessian geometry of the regularizer. This yields
local quadratic convergence rates of these methods for step size equal to the
penalization strength.
Related papers
- A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning.
This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z) - Convergence of policy gradient methods for finite-horizon exploratory
linear-quadratic control problems [3.8661825615213012]
We study the global linear convergence of policy gradient (PG) methods for finite-horizon continuous-time exploratory linear-quadratic control (LQC) problems.
We propose a novel PG method with discretetime policies. The algorithm leverages the continuous-time analysis, and achieves a robust linear convergence across different action frequencies.
arXiv Detail & Related papers (2022-11-01T17:31:41Z) - Linear Convergence of Natural Policy Gradient Methods with Log-Linear
Policies [115.86431674214282]
We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class.
We show that both methods attain linear convergence rates and $mathcalO (1/epsilon2)$ sample complexities using a simple, non-adaptive geometrically increasing step size.
arXiv Detail & Related papers (2022-10-04T06:17:52Z) - On the Convergence Rates of Policy Gradient Methods [9.74841674275568]
We consider geometrically discounted dominance problems with finite state sub spaces.
We show that with direct gradient pararization in a sample we can analyze the general complexity of a gradient.
arXiv Detail & Related papers (2022-01-19T07:03:37Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - Beyond Exact Gradients: Convergence of Stochastic Soft-Max Policy Gradient Methods with Entropy Regularization [20.651913793555163]
We revisit the classical entropy regularized policy gradient methods with the soft-max policy parametrization.
We establish a global optimality convergence result and a sample complexity of $widetildemathcalO(frac1epsilon2)$ for the proposed algorithm.
arXiv Detail & Related papers (2021-10-19T17:21:09Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Improving Metric Dimensionality Reduction with Distributed Topology [68.8204255655161]
DIPOLE is a dimensionality-reduction post-processing step that corrects an initial embedding by minimizing a loss functional with both a local, metric term and a global, topological term.
We observe that DIPOLE outperforms popular methods like UMAP, t-SNE, and Isomap on a number of popular datasets.
arXiv Detail & Related papers (2021-06-14T17:19:44Z) - Linear Convergence of Entropy-Regularized Natural Policy Gradient with
Linear Function Approximation [30.02577720946978]
We establish finite-time convergence analyses of entropy-regularized NPG with linear function approximation.
We prove that entropy-regularized NPG exhibits emphlinear convergence up to a function approximation error.
arXiv Detail & Related papers (2021-06-08T04:30:39Z) - Gauge Equivariant Mesh CNNs: Anisotropic convolutions on geometric
graphs [81.12344211998635]
A common approach to define convolutions on meshes is to interpret them as a graph and apply graph convolutional networks (GCNs)
We propose Gauge Equivariant Mesh CNNs which generalize GCNs to apply anisotropic gauge equivariant kernels.
Our experiments validate the significantly improved expressivity of the proposed model over conventional GCNs and other methods.
arXiv Detail & Related papers (2020-03-11T17:21:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.