Policy Gradient with Second Order Momentum
- URL: http://arxiv.org/abs/2505.11561v1
- Date: Fri, 16 May 2025 06:23:53 GMT
- Title: Policy Gradient with Second Order Momentum
- Authors: Tianyu Sun,
- Abstract summary: Policy Gradient with Second-Order Momentum (PG-SOM) is a lightweight second-order optimisation scheme for reinforcement-learning policies.<n>PG-SOM augments the classical REINFORCE update with two exponentially weighted statistics: a first-order gradient average and a diagonal approximation of the Hessian.<n>Experiments on standard control benchmarks show up to a 2.1x increase in sample efficiency and a substantial reduction in variance compared to first-order and Fisher-matrix baselines.
- Score: 2.44755919161855
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop Policy Gradient with Second-Order Momentum (PG-SOM), a lightweight second-order optimisation scheme for reinforcement-learning policies. PG-SOM augments the classical REINFORCE update with two exponentially weighted statistics: a first-order gradient average and a diagonal approximation of the Hessian. By preconditioning the gradient with this curvature estimate, the method adaptively rescales each parameter, yielding faster and more stable ascent of the expected return. We provide a concise derivation, establish that the diagonal Hessian estimator is unbiased and positive-definite under mild regularity assumptions, and prove that the resulting update is a descent direction in expectation. Numerical experiments on standard control benchmarks show up to a 2.1x increase in sample efficiency and a substantial reduction in variance compared to first-order and Fisher-matrix baselines. These results indicate that even coarse second-order information can deliver significant practical gains while incurring only D memory overhead for a D-parameter policy. All code and reproducibility scripts will be made publicly available.
Related papers
- Reusing Trajectories in Policy Gradients Enables Fast Convergence [59.27926064817273]
Policy gradient (PG) methods are a class of effective reinforcement learning algorithms.<n>We propose RPG (Retrospective Policy Gradient), a PG algorithm that combines old and new trajectories for policy updates.<n>Under established assumptions, RPG achieves a sample complexity of $widetildeO(epsilon-1)$, the best known rate in the literature.
arXiv Detail & Related papers (2025-06-06T15:42:15Z) - More Optimal Fractional-Order Stochastic Gradient Descent for Non-Convex Optimization Problems [2.5971517743176915]
We propose 2SED Fractional-Order Gradient Descent (2FOSGD), which integrates the Two-Scale Effective Dimension (2SED) with FOSGD.<n>By tracking sensitivity and effective dimensionality, 2SEDFOSGD dynamically modulates the exponent to mitigate sluggish oscillations and hasten convergence.
arXiv Detail & Related papers (2025-05-05T19:27:36Z) - Effective Dimension Aware Fractional-Order Stochastic Gradient Descent for Convex Optimization Problems [2.5971517743176915]
We propose 2SED Fractional-Order Gradient Descent (2SEDFOSGD) to adapt the fractional exponent in a data-driven manner.<n>Theoretically, this approach preserves the advantages of fractional memory without the sluggish or unstable behavior observed in na"ive fractional SGD.
arXiv Detail & Related papers (2025-03-17T22:57:37Z) - Refining Adaptive Zeroth-Order Optimization at Ease [24.327161891577727]
This paper introduces Refined Adaptive Zeroth-Order Optimization (R-AdaZO)<n>We first show the untapped variance reduction effect of first moment estimate on ZO gradient estimation.<n>We then refine the second moment estimate based on these variance-reduced gradient estimates to better capture the geometry of the optimization landscape.
arXiv Detail & Related papers (2025-02-03T03:10:44Z) - GP-FL: Model-Based Hessian Estimation for Second-Order Over-the-Air Federated Learning [52.295563400314094]
Second-order methods are widely adopted to improve the convergence rate of learning algorithms.<n>This paper introduces a novel second-order FL framework tailored for wireless channels.
arXiv Detail & Related papers (2024-12-05T04:27:41Z) - Estimating the Hessian Matrix of Ranking Objectives for Stochastic Learning to Rank with Gradient Boosted Trees [63.18324983384337]
We introduce the first learning to rank method for Gradient Boosted Decision Trees (GBDTs)
Our main contribution is a novel estimator for the second-order derivatives, i.e., the Hessian matrix.
We incorporate our estimator into the existing PL-Rank framework, which was originally designed for first-order derivatives only.
arXiv Detail & Related papers (2024-04-18T13:53:32Z) - Information-Theoretic Trust Regions for Stochastic Gradient-Based
Optimization [17.79206971486723]
We show that arTuRO combines the fast convergence of adaptive moment-based optimization with the capabilities of SGD.
We approximate the diagonal elements of the Hessian from gradients, constructing a model of the expected Hessian over time using only first-order information.
We show that arTuRO combines the fast convergence of adaptive moment-based optimization with the capabilities of SGD.
arXiv Detail & Related papers (2023-10-31T16:08:38Z) - The Role of Baselines in Policy Gradient Optimization [83.42050606055822]
We show that the emphstate value baseline allows on-policy.
emphnatural policy gradient (NPG) to converge to a globally optimal.
policy at an $O (1/t) rate gradient.
We find that the primary effect of the value baseline is to textbfreduce the aggressiveness of the updates rather than their variance.
arXiv Detail & Related papers (2023-01-16T06:28:00Z) - Online Statistical Inference for Contextual Bandits via Stochastic
Gradient Descent [10.108468796986074]
We study the online statistical inference of model parameters in a contextual bandit framework of decision-making.
We propose a general framework for online and adaptive data collection environment that can update decision rules via weighted gradient descent.
arXiv Detail & Related papers (2022-12-30T18:57:08Z) - NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizer [45.47667026025716]
We propose a novel, robust and accelerated iteration that relies on two key elements.
The convergence and stability of the obtained method, referred to as NAG-GS, are first studied extensively.
We show that NAG-arity is competitive with state-the-art methods such as momentum SGD with weight decay and AdamW for the training of machine learning models.
arXiv Detail & Related papers (2022-09-29T16:54:53Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Sparse Representations of Positive Functions via First and Second-Order
Pseudo-Mirror Descent [15.340540198612823]
We consider expected risk problems when the range of the estimator is required to be nonnegative.
We develop first and second-order variants of approximation mirror descent employing emphpseudo-gradients.
Experiments demonstrate favorable performance on ingeneous Process intensity estimation in practice.
arXiv Detail & Related papers (2020-11-13T21:54:28Z) - ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms.
We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients.
We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.