Fast Global Convergence of Natural Policy Gradient Methods with Entropy
Regularization
- URL: http://arxiv.org/abs/2007.06558v5
- Date: Thu, 8 Apr 2021 19:47:39 GMT
- Title: Fast Global Convergence of Natural Policy Gradient Methods with Entropy
Regularization
- Authors: Shicong Cen, Chen Cheng, Yuxin Chen, Yuting Wei, Yuejie Chi
- Abstract summary: Natural policy gradient (NPG) methods are among the most widely used policy optimization algorithms.
We develop convergence guarantees for entropy-regularized NPG methods under softmax parameterization.
Our results accommodate a wide range of learning rates, and shed light upon the role of entropy regularization in enabling fast convergence.
- Score: 44.24881971917951
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural policy gradient (NPG) methods are among the most widely used policy
optimization algorithms in contemporary reinforcement learning. This class of
methods is often applied in conjunction with entropy regularization -- an
algorithmic scheme that encourages exploration -- and is closely related to
soft policy iteration and trust region policy optimization. Despite the
empirical success, the theoretical underpinnings for NPG methods remain limited
even for the tabular setting. This paper develops $\textit{non-asymptotic}$
convergence guarantees for entropy-regularized NPG methods under softmax
parameterization, focusing on discounted Markov decision processes (MDPs).
Assuming access to exact policy evaluation, we demonstrate that the algorithm
converges linearly -- or even quadratically once it enters a local region
around the optimal policy -- when computing optimal value functions of the
regularized MDP. Moreover, the algorithm is provably stable vis-\`a-vis
inexactness of policy evaluation. Our convergence results accommodate a wide
range of learning rates, and shed light upon the role of entropy regularization
in enabling fast convergence.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.