Related papers: Rank-1 Approximation of Inverse Fisher for Natural Policy Gradients in Deep Reinforcement Learning

Rank-1 Approximation of Inverse Fisher for Natural Policy Gradients in Deep Reinforcement Learning

URL: http://arxiv.org/abs/2601.18626v1
Date: Mon, 26 Jan 2026 16:02:18 GMT
Title: Rank-1 Approximation of Inverse Fisher for Natural Policy Gradients in Deep Reinforcement Learning
Authors: Yingxiao Huo, Satya Prakash Dash, Radu Stoican, Samuel Kaski, Mingfei Sun,
Abstract summary: We show that a rank-1 approximation to inverse-FIM converges faster than policy gradients.<n>We benchmark our method on a diverse set of environments and show that it achieves superior performance to standard actor-critic and trust-region baselines.
Score: 17.531852538779372
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Natural gradients have long been studied in deep reinforcement learning due to their fast convergence properties and covariant weight updates. However, computing natural gradients requires inversion of the Fisher Information Matrix (FIM) at each iteration, which is computationally prohibitive in nature. In this paper, we present an efficient and scalable natural policy optimization technique that leverages a rank-1 approximation to full inverse-FIM. We theoretically show that under certain conditions, a rank-1 approximation to inverse-FIM converges faster than policy gradients and, under some conditions, enjoys the same sample complexity as stochastic policy gradient methods. We benchmark our method on a diverse set of environments and show that it achieves superior performance to standard actor-critic and trust-region baselines.

Related papers

ISOPO: Proximal policy gradients without pi-old [0.0]
ISOPO is an efficient method to approximate the natural policy gradient in a single step.<n>It can be implemented with negligible computational overhead compared to vanilla REINFORCE.
arXiv Detail & Related papers (2025-12-29T10:30:29Z)
Achieve Performatively Optimal Policy for Performative Reinforcement Learning [55.983627302691424]
This work proposes a zeroth-order FrankWolfe- (0FW) algorithm with a gradient of performative policy in the framework.<n> Experimental results demonstrate that our 0FW is more effective than the existing approximation in finding the desired PO policy.
arXiv Detail & Related papers (2025-10-06T01:56:31Z)
Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$-Smoothness [51.302674884611335]
This work introduces a hybrid non-Euclidean optimization method which generalizes norm clipping by combining steepest descent and conditional gradient approaches.<n>We discuss how to instantiate the algorithms for deep learning and demonstrate their properties on image classification and language modeling.
arXiv Detail & Related papers (2025-06-02T17:34:29Z)
Elementary Analysis of Policy Gradient Methods [3.468656086349638]
This paper focuses on the discounted MDP setting and conduct a systematic study of the aforementioned policy optimization methods. Several novel results are presented, including 1) global linear convergence of projected policy gradient for any constant step size, 2) sublinear convergence of softmax policy gradient for any constant step size, 3) global linear convergence of softmax natural policy gradient for any constant step size, 4) global linear convergence of entropy regularized softmax policy gradient for a wider range of constant step sizes than existing result, 5) tight local linear convergence rate of entropy regularized natural policy gradient, and 6) a new and concise local quadratic convergence rate of soft
arXiv Detail & Related papers (2024-04-04T11:16:16Z)
Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies [115.86431674214282]
We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class. We show that both methods attain linear convergence rates and $mathcalO (1/epsilon2)$ sample complexities using a simple, non-adaptive geometrically increasing step size.
arXiv Detail & Related papers (2022-10-04T06:17:52Z)
Bag of Tricks for Natural Policy Gradient Reinforcement Learning [87.54231228860495]
We have implemented and compared strategies that impact performance in natural policy gradient reinforcement learning. The proposed collection of strategies for performance optimization can improve results by 86% to 181% across the MuJuCo control benchmark.
arXiv Detail & Related papers (2022-01-22T17:44:19Z)
On the Linear convergence of Natural Policy Gradient Algorithm [5.027714423258537]
Recent interest in Reinforcement Learning has motivated the study of methods inspired by optimization. Among these is the Natural Policy Gradient, which is a mirror descent variant for MDPs. We present improved finite time convergence bounds, and show that this algorithm has geometric convergence rate.
arXiv Detail & Related papers (2021-05-04T11:26:12Z)
Softmax Policy Gradient Methods Can Take Exponential Time to Converge [60.98700344526674]
The softmax policy gradient (PG) method is arguably one of the de facto implementations of policy optimization in modern reinforcement learning. We demonstrate that softmax PG methods can take exponential time -- in terms of $mathcalS|$ and $frac11-gamma$ -- to converge.
arXiv Detail & Related papers (2021-02-22T18:56:26Z)
On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method [38.34416337932712]
Policy gives rise to a rich class of reinforcement learning (RL) methods, for example the REINFORCE. Yet the best known sample complexity result for such methods to find an $epsilon$-optimal policy is $mathcalO(epsilon-3)$, which is suboptimal. We study the fundamental convergence properties and sample efficiency of first-order policy optimization method.
arXiv Detail & Related papers (2021-02-17T07:06:19Z)
Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization [44.24881971917951]
Natural policy gradient (NPG) methods are among the most widely used policy optimization algorithms. We develop convergence guarantees for entropy-regularized NPG methods under softmax parameterization. Our results accommodate a wide range of learning rates, and shed light upon the role of entropy regularization in enabling fast convergence.
arXiv Detail & Related papers (2020-07-13T17:58:41Z)
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks. In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems. Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.