Related papers: Momentum-Based Policy Gradient with Second-Order Information

Momentum-Based Policy Gradient with Second-Order Information

URL: http://arxiv.org/abs/2205.08253v3
Date: Sun, 26 Nov 2023 19:59:40 GMT
Title: Momentum-Based Policy Gradient with Second-Order Information
Authors: Saber Salehkaleybar, Sadegh Khorasani, Negar Kiyavash, Niao He, Patrick Thiran
Abstract summary: We propose a variance-reduced policy-gradient method, called SHARP, which incorporates second-order information into gradient descent. Unlike most previous work, our proposed algorithm does not require importance sampling which can compromise the advantage of variance reduction process. Our extensive experimental evaluations show the effectiveness of the proposed algorithm on various control tasks and its advantage over the state of the art in practice.
Score: 40.51117836892182
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Variance-reduced gradient estimators for policy gradient methods have been one of the main focus of research in the reinforcement learning in recent years as they allow acceleration of the estimation process. We propose a variance-reduced policy-gradient method, called SHARP, which incorporates second-order information into stochastic gradient descent (SGD) using momentum with a time-varying learning rate. SHARP algorithm is parameter-free, achieving $\epsilon$-approximate first-order stationary point with $O(\epsilon^{-3})$ number of trajectories, while using a batch size of $O(1)$ at each iteration. Unlike most previous work, our proposed algorithm does not require importance sampling which can compromise the advantage of variance reduction process. Moreover, the variance of estimation error decays with the fast rate of $O(1/t^{2/3})$ where $t$ is the number of iterations. Our extensive experimental evaluations show the effectiveness of the proposed algorithm on various control tasks and its advantage over the state of the art in practice.

Related papers

Policy Gradient with Active Importance Sampling [55.112959067035916]
Policy gradient (PG) methods significantly benefit from IS, enabling the effective reuse of previously collected samples. However, IS is employed in RL as a passive tool for re-weighting historical samples. We look for the best behavioral policy from which to collect samples to reduce the policy gradient variance.
arXiv Detail & Related papers (2024-05-09T09:08:09Z)
Efficiently Escaping Saddle Points for Non-Convex Policy Optimization [40.0986936439803]
Policy gradient (PG) is widely used in reinforcement learning due to its scalability and good performance. We propose a variance-reduced second-order method that uses second-order information in the form of Hessian vector products (HVP) and converges to an approximate second-order stationary point (SOSP) with sample complexity of $tildeO(epsilon-3)$.
arXiv Detail & Related papers (2023-11-15T12:36:45Z)
Stochastic Dimension-reduced Second-order Methods for Policy Optimization [11.19708535159457]
We propose several new second-order algorithms for policy optimization that only require gradient and Hessian-vector product in each iteration. Specifically, we propose a dimension-reduced second-order method (DR-SOPO) which repeatedly solves a projected two-dimensional trust region subproblem. We show that DR-SOPO obtains an $mathcalO(epsilon-3.5)$ complexity for reaching approximate first-order stationary condition. In addition, we present an enhanced algorithm (DVR-SOPO) which further improves the complexity to $mathcalO
arXiv Detail & Related papers (2023-01-28T12:09:58Z)
A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning [13.908826484332282]
We study a new two-time-scale gradient method for solving optimization problems. Our first contribution is to characterize the finite-time complexity of the proposed two-time-scale gradient algorithm. We apply our framework to gradient-based policy evaluation algorithms in reinforcement learning.
arXiv Detail & Related papers (2021-09-29T23:15:23Z)
Bregman Gradient Policy Optimization [97.73041344738117]
We design a Bregman gradient policy optimization for reinforcement learning based on Bregman divergences and momentum techniques. VR-BGPO reaches the best complexity $tilde(epsilon-3)$ for finding an $epsilon$stationary point only requiring one trajectory at each iteration.
arXiv Detail & Related papers (2021-06-23T01:08:54Z)
Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling. Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z)
Average-Reward Off-Policy Policy Evaluation with Function Approximation [66.67075551933438]
We consider off-policy policy evaluation with function approximation in average-reward MDPs. bootstrapping is necessary and, along with off-policy learning and FA, results in the deadly triad. We propose two novel algorithms, reproducing the celebrated success of Gradient TD algorithms in the average-reward setting.
arXiv Detail & Related papers (2021-01-08T00:43:04Z)
Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step. Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)
Stochastic Recursive Momentum for Policy Gradient Methods [28.277961340108313]
We propose a novel algorithm named STOchastic Recursive Momentum for Policy Gradient (Storm-PG) Storm-PG enjoys a provably sharp $O (1/epsilon3)$ sample bound for STORM-PG, matching the best-known convergence rate for policy gradient algorithm. Numerical experiments depicts the superiority of our algorithm over comparative policy gradient algorithms.
arXiv Detail & Related papers (2020-03-09T17:59:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.