Related papers: Restarted Nonconvex Accelerated Gradient Descent: No More Polylogarithmic Factor in the $O(\epsilon^{-7/4})$ Complexity

Restarted Nonconvex Accelerated Gradient Descent: No More Polylogarithmic Factor in the $O(\epsilon^{-7/4})$ Complexity

URL: http://arxiv.org/abs/2201.11411v4
Date: Wed, 26 Apr 2023 02:17:13 GMT
Title: Restarted Nonconvex Accelerated Gradient Descent: No More Polylogarithmic Factor in the $O(\epsilon^{-7/4})$ Complexity
Authors: Huan Li and Zhouchen Lin
Abstract summary: We propose two simple accelerated gradient methods, restarted gradient descent (AGD) and restarted ball (HB) method. We establish that our methods achieve an $frac1epsilon)$ number of gradient iterations. Our algorithms are simple in the sense that they only consist of Nestov's classical AGD orak's HB as well as a restart mechanism.
Score: 70.65867695317633
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper studies accelerated gradient methods for nonconvex optimization with Lipschitz continuous gradient and Hessian. We propose two simple accelerated gradient methods, restarted accelerated gradient descent (AGD) and restarted heavy ball (HB) method, and establish that our methods achieve an $\epsilon$-approximate first-order stationary point within $O(\epsilon^{-7/4})$ number of gradient evaluations by elementary proofs. Theoretically, our complexity does not hide any polylogarithmic factors, and thus it improves over the best known one by the $O(\log\frac{1}{\epsilon})$ factor. Our algorithms are simple in the sense that they only consist of Nesterov's classical AGD or Polyak's HB iterations, as well as a restart mechanism. They do not invoke negative curvature exploitation or minimization of regularized surrogate functions as the subroutines. In contrast with existing analysis, our elementary proofs use less advanced techniques and do not invoke the analysis of strongly convex AGD or HB. Code is avaliable at https://github.com/lihuanML/RestartAGD.

Related papers

Improved Complexity for Smooth Nonconvex Optimization: A Two-Level Online Learning Approach with Quasi-Newton Methods [18.47705532817026]
We study the problem of finding an $epsilon$firstorder stationary point (FOSP) of a smooth function. We introduce a novel optimistic quasi- gradient approximation method to solve this online learning problem.
arXiv Detail & Related papers (2024-12-03T05:20:05Z)
Generalization Bounds for Gradient Methods via Discrete and Continuous Prior [8.76346911214414]
We show a new high probability generalization bound of order $O(frac1n + fracL2n2sum_t=1T(gamma_t/varepsilon_t)2)$ for gradient Langevin Dynamics (GLD) We can also obtain new bounds for certain variants of SGD.
arXiv Detail & Related papers (2022-05-27T07:23:01Z)
ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally! [6.170898159041277]
We introduce ProxSkip, a surprisingly simple and provably efficient method for minimizing the sum of a smooth ($f$) and an expensive nonsmooth ($psi$) function. Our main motivation comes from federated learning, where evaluation of the gradient operator corresponds to taking a local GD step independently on all devices.
arXiv Detail & Related papers (2022-02-18T18:56:05Z)
Escape saddle points by a simple gradient-descent based algorithm [6.7298812735467095]
Escaping saddle points is a central research topic in non optimization. We propose a simple gradient-based algorithm such that for a smooth function $fcolonmathbbRntomathbbR$, it outputs an $epsilonO(log n)2/epsilon4)$. Technically, our main contribution is an idea of implementing a robust Hessian power method using only gradients.
arXiv Detail & Related papers (2021-11-28T07:32:54Z)
Faster Perturbed Stochastic Gradient Methods for Finding Local Minima [92.99933928528797]
We propose tttPullback, a faster perturbed gradient framework for finding local minima. We show that Pullback with gradient estimators such as SARAH/SP and STORM can find $(epsilon, epsilon_H)$approximate local minima within $tilde O(epsilon-3 + H-6)$. The core idea of our framework is a step-size pullback'' scheme to control the average movement of the gradient evaluations.
arXiv Detail & Related papers (2021-10-25T07:20:05Z)
Correcting Momentum with Second-order Information [50.992629498861724]
We develop a new algorithm for non-critical optimization that finds an $O(epsilon)$epsilon point in the optimal product. We validate our results on a variety of large-scale deep learning benchmarks and architectures.
arXiv Detail & Related papers (2021-03-04T19:01:20Z)
Gradient Free Minimax Optimization: Variance Reduction and Faster Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem. We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z)
Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions. In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
arXiv Detail & Related papers (2020-02-10T23:23:04Z)
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks. In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems. Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
Near-Optimal Methods for Minimizing Star-Convex Functions and Beyond [25.845034951660544]
We show that no first-order method can improve upon ours. We develop a variant accelerated descent that computes an $$quasar alog function. No first-order method can improve upon ours.
arXiv Detail & Related papers (2019-06-27T22:39:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.