Approximate Newton policy gradient algorithms
- URL: http://arxiv.org/abs/2110.02398v6
- Date: Thu, 8 Jun 2023 04:57:49 GMT
- Title: Approximate Newton policy gradient algorithms
- Authors: Haoya Li, Samarth Gupta, Hsiangfu Yu, Lexing Ying, Inderjit Dhillon
- Abstract summary: This paper proposes an approximate Newton method for the policy gradient algorithm with entropy regularization.
We prove that all these algorithms enjoy Newton-type quadratic convergence and that the corresponding gradient flow converges globally to the optimal solution.
- Score: 18.032678371017198
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Policy gradient algorithms have been widely applied to Markov decision
processes and reinforcement learning problems in recent years. Regularization
with various entropy functions is often used to encourage exploration and
improve stability. This paper proposes an approximate Newton method for the
policy gradient algorithm with entropy regularization. In the case of Shannon
entropy, the resulting algorithm reproduces the natural policy gradient
algorithm. For other entropy functions, this method results in brand-new policy
gradient algorithms. We prove that all these algorithms enjoy Newton-type
quadratic convergence and that the corresponding gradient flow converges
globally to the optimal solution. We use synthetic and industrial-scale
examples to demonstrate that the proposed approximate Newton method typically
converges in single-digit iterations, often orders of magnitude faster than
other state-of-the-art algorithms.
Related papers
- Invex Programs: First Order Algorithms and Their Convergence [66.40124280146863]
Invex programs are a special kind of non-constrained problems which attain global minima at every stationary point.
We propose new first-order algorithms to solve general convergence rates in beyondvex problems.
Our proposed algorithm is the first algorithm to solve constrained invex programs.
arXiv Detail & Related papers (2023-07-10T10:11:01Z) - A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning [9.628032156001073]
We propose two policy Newton algorithms that incorporate cubic regularization.
Both algorithms employ the likelihood ratio method to form estimates of the gradient and Hessian of the value function.
In particular, the sample complexity of our algorithms to find an $epsilon$-SOSP is $O(epsilon-3.5)$, which is an improvement over the state-of-the-art sample complexity of $O(epsilon-4.5)$.
arXiv Detail & Related papers (2023-04-21T13:43:06Z) - Robust empirical risk minimization via Newton's method [9.797319790710711]
A new variant of Newton's method for empirical risk minimization is studied.
The gradient and Hessian of the objective function are replaced by robust estimators.
An algorithm for obtaining robust Newton directions based on the conjugate gradient method is also proposed.
arXiv Detail & Related papers (2023-01-30T18:54:54Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - Continuation Newton methods with deflation techniques for global
optimization problems [3.705839280172101]
A global minimum point of an optimization problem is of interest in engineering.
In this article, we consider a new memetic algorithm for this nonlinear largescale problem.
According to our numerical experiments, new algorithm works well for unconstrained unconstrained problems.
arXiv Detail & Related papers (2021-07-29T09:53:49Z) - The Bayesian Learning Rule [14.141964578853262]
We show that many machine-learning algorithms are specific instances of a single algorithm called the emphBayesian learning rule
The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models.
arXiv Detail & Related papers (2021-07-09T17:28:55Z) - Bregman Gradient Policy Optimization [97.73041344738117]
We design a Bregman gradient policy optimization for reinforcement learning based on Bregman divergences and momentum techniques.
VR-BGPO reaches the best complexity $tilde(epsilon-3)$ for finding an $epsilon$stationary point only requiring one trajectory at each iteration.
arXiv Detail & Related papers (2021-06-23T01:08:54Z) - On the Linear convergence of Natural Policy Gradient Algorithm [5.027714423258537]
Recent interest in Reinforcement Learning has motivated the study of methods inspired by optimization.
Among these is the Natural Policy Gradient, which is a mirror descent variant for MDPs.
We present improved finite time convergence bounds, and show that this algorithm has geometric convergence rate.
arXiv Detail & Related papers (2021-05-04T11:26:12Z) - Average-Reward Off-Policy Policy Evaluation with Function Approximation [66.67075551933438]
We consider off-policy policy evaluation with function approximation in average-reward MDPs.
bootstrapping is necessary and, along with off-policy learning and FA, results in the deadly triad.
We propose two novel algorithms, reproducing the celebrated success of Gradient TD algorithms in the average-reward setting.
arXiv Detail & Related papers (2021-01-08T00:43:04Z) - Variance-Reduced Off-Policy Memory-Efficient Policy Search [61.23789485979057]
Off-policy policy optimization is a challenging problem in reinforcement learning.
Off-policy algorithms are memory-efficient and capable of learning from off-policy samples.
arXiv Detail & Related papers (2020-09-14T16:22:46Z) - Accelerated Message Passing for Entropy-Regularized MAP Inference [89.15658822319928]
Maximum a posteriori (MAP) inference in discrete-valued random fields is a fundamental problem in machine learning.
Due to the difficulty of this problem, linear programming (LP) relaxations are commonly used to derive specialized message passing algorithms.
We present randomized methods for accelerating these algorithms by leveraging techniques that underlie classical accelerated gradient.
arXiv Detail & Related papers (2020-07-01T18:43:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.