Related papers: Anytime Acceleration of Gradient Descent

Anytime Acceleration of Gradient Descent

URL: http://arxiv.org/abs/2411.17668v2
Date: Sun, 08 Dec 2024 11:22:26 GMT
Title: Anytime Acceleration of Gradient Descent
Authors: Zihan Zhang, Jason D. Lee, Simon S. Du, Yuxin Chen,
Abstract summary: We investigate stepsize-based acceleration of gradient descent with em anytime convergence guarantees.<n>For smooth (non-strongly) convex optimization, we propose a stepsize schedule that allows gradient descent to achieve convergence guarantees of $O(T-1.119)$ for any stopping time.
Score: 92.02082223856479
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work investigates stepsize-based acceleration of gradient descent with {\em anytime} convergence guarantees. For smooth (non-strongly) convex optimization, we propose a stepsize schedule that allows gradient descent to achieve convergence guarantees of $O(T^{-1.119})$ for any stopping time $T$, where the stepsize schedule is predetermined without prior knowledge of the stopping time. This result provides an affirmative answer to a COLT open problem \citep{kornowski2024open} regarding whether stepsize-based acceleration can yield anytime convergence rates of $o(T^{-1})$. We further extend our theory to yield anytime convergence guarantees of $\exp(-\Omega(T/\kappa^{0.893}))$ for smooth and strongly convex optimization, with $\kappa$ being the condition number.

Related papers

Large Stepsizes Accelerate Gradient Descent for Regularized Logistic Regression [9.15420898792103]
We study gradient descent (GD) with a constant stepsize for $ell$-regularized logistic regression with linearly separable data.<n>We show that this can be accelerated to $widetildemathcalO(sqrtkappa)$ by simply using a large stepsize.
arXiv Detail & Related papers (2025-06-03T00:21:22Z)
Open Problem: Anytime Convergence Rate of Gradient Descent [37.300102993926046]
We show that vanilla gradient descent can be accelerated for smooth convex objectives, merely by changing the stepsize sequence. We ask: Is there any stepsize schedule for gradient descent that accelerates the classic $mathcalO (1/T)$ convergence rate, at emphany stopping time $T$?
arXiv Detail & Related papers (2024-06-19T23:34:47Z)
Stochastic Newton Proximal Extragradient Method [18.47705532817026]
We propose a novel Newton extragradient method that improves these bounds. We accomplish this by extending the Hybrid Proximal Extragradient (HPE) framework.
arXiv Detail & Related papers (2024-06-03T16:06:23Z)
Accelerated Quasi-Newton Proximal Extragradient: Faster Rate for Smooth Convex Optimization [26.328847475942894]
We prove that our method can achieve a convergence rate of $Obigl(minfrac1k2, fracsqrtdlog kk2.5bigr)$, where $d$ is the problem dimension and $k$ is the number of iterations. To the best of our knowledge, this result is the first to demonstrate a provable gain of a quasi-Newton-type method over Nesterov's accelerated gradient.
arXiv Detail & Related papers (2023-06-03T23:31:27Z)
Breaking the Lower Bound with (Little) Structure: Acceleration in Non-Convex Stochastic Optimization with Heavy-Tailed Noise [28.780192812703948]
We consider the optimization problem with smooth but not necessarily convex objectives in the heavy-tailed noise regime. We show that one can achieve a faster rate than that dictated by the lower bound $Omega(Tfrac1-p3p-2)$ with only tiny bit of structure. We also establish that it guarantees a high-probability convergence rate of $O(log(T/delta)Tfrac1-p3p-2)$ under a mild condition.
arXiv Detail & Related papers (2023-02-14T00:23:42Z)
ReSQueing Parallel and Private Stochastic Convex Optimization [59.53297063174519]
We introduce a new tool for BFG convex optimization (SCO): a Reweighted Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. We develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings.
arXiv Detail & Related papers (2023-01-01T18:51:29Z)
Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning [77.22019100456595]
We show a training algorithm for distributed computation workers with varying communication frequency. In this work, we obtain a tighter convergence rate of $mathcalO!!!(sigma2-2_avg!! . We also show that the heterogeneity term in rate is affected by the average delay within each worker.
arXiv Detail & Related papers (2022-06-16T17:10:57Z)
Asynchronous Stochastic Optimization Robust to Arbitrary Delays [54.61797739710608]
We consider optimization with delayed gradients where, at each time stept$, the algorithm makes an update using a stale computation - d_t$ for arbitrary delay $d_t gradient. Our experiments demonstrate the efficacy and robustness of our algorithm in cases where the delay distribution is skewed or heavy-tailed.
arXiv Detail & Related papers (2021-06-22T15:50:45Z)
On the Convergence of Step Decay Step-Size for Stochastic Optimization [27.02857082612736]
The convergence of neural descent is highly dependent on the step-size rate, especially on non-math problems such as network problems. We provide the convergence for decay in the non-smooth regime, ensuring the gradient norm vanishes. In the strongly convex case establish an $( T/ST)$ rate, we also prove to be an $( T/ST)$ rate.
arXiv Detail & Related papers (2021-02-18T14:37:25Z)
On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems [75.58134963501094]
This paper analyzes the trajectories of gradient descent (SGD) We show that SGD avoids saddle points/manifolds with $1$ for strict step-size policies.
arXiv Detail & Related papers (2020-06-19T14:11:26Z)
A Simple Convergence Proof of Adam and Adagrad [74.24716715922759]
We show a proof of convergence between the Adam Adagrad and $O(d(N)/st)$ algorithms. Adam converges with the same convergence $O(d(N)/st)$ when used with the default parameters.
arXiv Detail & Related papers (2020-03-05T01:56:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.