Anytime Acceleration of Gradient Descent
- URL: http://arxiv.org/abs/2411.17668v2
- Date: Sun, 08 Dec 2024 11:22:26 GMT
- Title: Anytime Acceleration of Gradient Descent
- Authors: Zihan Zhang, Jason D. Lee, Simon S. Du, Yuxin Chen,
- Abstract summary: We investigate stepsize-based acceleration of gradient descent with em anytime convergence guarantees.<n>For smooth (non-strongly) convex optimization, we propose a stepsize schedule that allows gradient descent to achieve convergence guarantees of $O(T-1.119)$ for any stopping time.
- Score: 92.02082223856479
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work investigates stepsize-based acceleration of gradient descent with {\em anytime} convergence guarantees. For smooth (non-strongly) convex optimization, we propose a stepsize schedule that allows gradient descent to achieve convergence guarantees of $O(T^{-1.119})$ for any stopping time $T$, where the stepsize schedule is predetermined without prior knowledge of the stopping time. This result provides an affirmative answer to a COLT open problem \citep{kornowski2024open} regarding whether stepsize-based acceleration can yield anytime convergence rates of $o(T^{-1})$. We further extend our theory to yield anytime convergence guarantees of $\exp(-\Omega(T/\kappa^{0.893}))$ for smooth and strongly convex optimization, with $\kappa$ being the condition number.
Related papers
- Large Stepsizes Accelerate Gradient Descent for Regularized Logistic Regression [9.15420898792103]
We study gradient descent (GD) with a constant stepsize for $ell$-regularized logistic regression with linearly separable data.<n>We show that this can be accelerated to $widetildemathcalO(sqrtkappa)$ by simply using a large stepsize.
arXiv Detail & Related papers (2025-06-03T00:21:22Z) - Open Problem: Anytime Convergence Rate of Gradient Descent [37.300102993926046]
We show that vanilla gradient descent can be accelerated for smooth convex objectives, merely by changing the stepsize sequence.
We ask: Is there any stepsize schedule for gradient descent that accelerates the classic $mathcalO (1/T)$ convergence rate, at emphany stopping time $T$?
arXiv Detail & Related papers (2024-06-19T23:34:47Z) - Stochastic Newton Proximal Extragradient Method [18.47705532817026]
We propose a novel Newton extragradient method that improves these bounds.
We accomplish this by extending the Hybrid Proximal Extragradient (HPE) framework.
arXiv Detail & Related papers (2024-06-03T16:06:23Z) - Accelerated Quasi-Newton Proximal Extragradient: Faster Rate for Smooth
Convex Optimization [26.328847475942894]
We prove that our method can achieve a convergence rate of $Obigl(minfrac1k2, fracsqrtdlog kk2.5bigr)$, where $d$ is the problem dimension and $k$ is the number of iterations.
To the best of our knowledge, this result is the first to demonstrate a provable gain of a quasi-Newton-type method over Nesterov's accelerated gradient.
arXiv Detail & Related papers (2023-06-03T23:31:27Z) - Breaking the Lower Bound with (Little) Structure: Acceleration in
Non-Convex Stochastic Optimization with Heavy-Tailed Noise [28.780192812703948]
We consider the optimization problem with smooth but not necessarily convex objectives in the heavy-tailed noise regime.
We show that one can achieve a faster rate than that dictated by the lower bound $Omega(Tfrac1-p3p-2)$ with only tiny bit of structure.
We also establish that it guarantees a high-probability convergence rate of $O(log(T/delta)Tfrac1-p3p-2)$ under a mild condition.
arXiv Detail & Related papers (2023-02-14T00:23:42Z) - ReSQueing Parallel and Private Stochastic Convex Optimization [59.53297063174519]
We introduce a new tool for BFG convex optimization (SCO): a Reweighted Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density.
We develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings.
arXiv Detail & Related papers (2023-01-01T18:51:29Z) - Sharper Convergence Guarantees for Asynchronous SGD for Distributed and
Federated Learning [77.22019100456595]
We show a training algorithm for distributed computation workers with varying communication frequency.
In this work, we obtain a tighter convergence rate of $mathcalO!!!(sigma2-2_avg!! .
We also show that the heterogeneity term in rate is affected by the average delay within each worker.
arXiv Detail & Related papers (2022-06-16T17:10:57Z) - Asynchronous Stochastic Optimization Robust to Arbitrary Delays [54.61797739710608]
We consider optimization with delayed gradients where, at each time stept$, the algorithm makes an update using a stale computation - d_t$ for arbitrary delay $d_t gradient.
Our experiments demonstrate the efficacy and robustness of our algorithm in cases where the delay distribution is skewed or heavy-tailed.
arXiv Detail & Related papers (2021-06-22T15:50:45Z) - On the Convergence of Step Decay Step-Size for Stochastic Optimization [27.02857082612736]
The convergence of neural descent is highly dependent on the step-size rate, especially on non-math problems such as network problems.
We provide the convergence for decay in the non-smooth regime, ensuring the gradient norm vanishes.
In the strongly convex case establish an $( T/ST)$ rate, we also prove to be an $( T/ST)$ rate.
arXiv Detail & Related papers (2021-02-18T14:37:25Z) - On the Almost Sure Convergence of Stochastic Gradient Descent in
Non-Convex Problems [75.58134963501094]
This paper analyzes the trajectories of gradient descent (SGD)
We show that SGD avoids saddle points/manifolds with $1$ for strict step-size policies.
arXiv Detail & Related papers (2020-06-19T14:11:26Z) - A Simple Convergence Proof of Adam and Adagrad [74.24716715922759]
We show a proof of convergence between the Adam Adagrad and $O(d(N)/st)$ algorithms.
Adam converges with the same convergence $O(d(N)/st)$ when used with the default parameters.
arXiv Detail & Related papers (2020-03-05T01:56:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.