Efficiently Escaping Saddle Points for Non-Convex Policy Optimization
- URL: http://arxiv.org/abs/2311.08914v1
- Date: Wed, 15 Nov 2023 12:36:45 GMT
- Title: Efficiently Escaping Saddle Points for Non-Convex Policy Optimization
- Authors: Sadegh Khorasani, Saber Salehkaleybar, Negar Kiyavash, Niao He,
Matthias Grossglauser
- Abstract summary: Policy gradient (PG) is widely used in reinforcement learning due to its scalability and good performance.
We propose a variance-reduced second-order method that uses second-order information in the form of Hessian vector products (HVP) and converges to an approximate second-order stationary point (SOSP) with sample complexity of $tildeO(epsilon-3)$.
- Score: 40.0986936439803
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Policy gradient (PG) is widely used in reinforcement learning due to its
scalability and good performance. In recent years, several variance-reduced PG
methods have been proposed with a theoretical guarantee of converging to an
approximate first-order stationary point (FOSP) with the sample complexity of
$O(\epsilon^{-3})$. However, FOSPs could be bad local optima or saddle points.
Moreover, these algorithms often use importance sampling (IS) weights which
could impair the statistical effectiveness of variance reduction. In this
paper, we propose a variance-reduced second-order method that uses second-order
information in the form of Hessian vector products (HVP) and converges to an
approximate second-order stationary point (SOSP) with sample complexity of
$\tilde{O}(\epsilon^{-3})$. This rate improves the best-known sample complexity
for achieving approximate SOSPs by a factor of $O(\epsilon^{-0.5})$. Moreover,
the proposed variance reduction technique bypasses IS weights by using HVP
terms. Our experimental results show that the proposed algorithm outperforms
the state of the art and is more robust to changes in random seeds.
Related papers
- Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems without First-Order Gradient [40.22217106270146]
Variance reduction techniques are designed to decrease the sampling variance, thereby accelerating convergence rates of first-order (FO) and zeroth-order (ZO) optimization methods.
In composite optimization problems, ZO methods encounter an additional variance called the coordinate-wise variance, which stems from the random estimation.
This paper proposes the Zeroth-order Proximal Double Variance Reduction (ZPDVR) method, which utilizes the averaging trick to reduce both sampling and coordinate-wise variances.
arXiv Detail & Related papers (2024-05-28T02:27:53Z) - Semi-Discrete Optimal Transport: Nearly Minimax Estimation With Stochastic Gradient Descent and Adaptive Entropic Regularization [38.67914746910537]
We prove an $mathcalO(t-1)$ lower bound rate for the OT map, using the similarity between Laguerre cells estimation and density support estimation.
To nearly achieve the desired fast rate, we design an entropic regularization scheme decreasing with the number of samples.
arXiv Detail & Related papers (2024-05-23T11:46:03Z) - Stochastic Dimension-reduced Second-order Methods for Policy
Optimization [11.19708535159457]
We propose several new second-order algorithms for policy optimization that only require gradient and Hessian-vector product in each iteration.
Specifically, we propose a dimension-reduced second-order method (DR-SOPO) which repeatedly solves a projected two-dimensional trust region subproblem.
We show that DR-SOPO obtains an $mathcalO(epsilon-3.5)$ complexity for reaching approximate first-order stationary condition.
In addition, we present an enhanced algorithm (DVR-SOPO) which further improves the complexity to $mathcalO
arXiv Detail & Related papers (2023-01-28T12:09:58Z) - Momentum-Based Policy Gradient with Second-Order Information [40.51117836892182]
We propose a variance-reduced policy-gradient method, called SHARP, which incorporates second-order information into gradient descent.
Unlike most previous work, our proposed algorithm does not require importance sampling which can compromise the advantage of variance reduction process.
Our extensive experimental evaluations show the effectiveness of the proposed algorithm on various control tasks and its advantage over the state of the art in practice.
arXiv Detail & Related papers (2022-05-17T11:56:50Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - Cautiously Optimistic Policy Optimization and Exploration with Linear
Function Approximation [48.744735294559824]
Policy optimization methods are popular reinforcement learning algorithms, because their incremental and on-policy nature makes them more stable than the value-based counterparts.
In this paper, we propose a new algorithm, COPOE, that overcomes the sample complexity issue of PCPG while retaining its robustness to model misspecification.
The result is an improvement in sample complexity from $widetildeO (1/epsilon11)$ for PCPG to $widetildeO (1/epsilon3)$ for PCPG, nearly bridging the gap with value-based techniques.
arXiv Detail & Related papers (2021-03-24T01:42:59Z) - Correcting Momentum with Second-order Information [50.992629498861724]
We develop a new algorithm for non-critical optimization that finds an $O(epsilon)$epsilon point in the optimal product.
We validate our results on a variety of large-scale deep learning benchmarks and architectures.
arXiv Detail & Related papers (2021-03-04T19:01:20Z) - A Variance Controlled Stochastic Method with Biased Estimation for
Faster Non-convex Optimization [0.0]
We propose a new technique, em variance controlled gradient (VCSG), to improve the performance of the reduced gradient (SVRG)
$lambda$ is introduced in VCSG to avoid over-reducing a variance by SVRG.
$mathcalO(min1/epsilon3/2,n1/4/epsilon)$ number of gradient evaluations, which improves the leading gradient complexity.
arXiv Detail & Related papers (2021-02-19T12:22:56Z) - Large-Scale Methods for Distributionally Robust Optimization [53.98643772533416]
We prove that our algorithms require a number of evaluations gradient independent of training set size and number of parameters.
Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9--36 times more efficient than full-batch methods.
arXiv Detail & Related papers (2020-10-12T17:41:44Z) - Private Stochastic Non-Convex Optimization: Adaptive Algorithms and
Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization.
We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.