Nesterov Finds GRAAL: Optimal and Adaptive Gradient Method for Convex Optimization
- URL: http://arxiv.org/abs/2507.09823v1
- Date: Sun, 13 Jul 2025 23:07:45 GMT
- Title: Nesterov Finds GRAAL: Optimal and Adaptive Gradient Method for Convex Optimization
- Authors: Ekaterina Borodich, Dmitry Kovalev,
- Abstract summary: adaptive gradient methods, including GRAAL (Malitsky, 2020), have been developed.<n>We prove that our algorithm achieves the desired optimal convergence rate for Lipschitz smooth functions.<n>In contrast to existing methods, it does so with an arbitrary, even excessively small, initial stepsize at the cost of a logarithmic additive term in the complexity.
- Score: 17.227158587717934
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we focus on the problem of minimizing a continuously differentiable convex objective function $\min_x f(x)$. Recently, several adaptive gradient methods, including GRAAL (Malitsky, 2020), have been developed. These methods estimate the local curvature of the objective function to compute stepsizes, attain the standard convergence rate $\mathcal{O}(1/k)$ of fixed-stepsize gradient descent for Lipschitz-smooth functions, and do not require any line search procedures or hyperparameter tuning. However, a natural question arises: is it possible to accelerate the convergence of these algorithms to match the optimal rate $\mathcal{O}(1/k^2)$ of the accelerated gradient descent of Nesterov (1983)? Although some attempts have been made (Li and Lan, 2023), the capabilities of the existing accelerated algorithms to adapt to the curvature of the objective function are highly limited. Consequently, we provide a positive answer to this question and develop GRAAL with Nesterov acceleration. We prove that our algorithm achieves the desired optimal convergence rate for Lipschitz smooth functions. Moreover, in contrast to existing methods, it does so with an arbitrary, even excessively small, initial stepsize at the cost of a logarithmic additive term in the iteration complexity.
Related papers
- Gradient-free stochastic optimization for additive models [56.42455605591779]
We address the problem of zero-order optimization from noisy observations for an objective function satisfying the Polyak-Lojasiewicz or the strong convexity condition.<n>We assume that the objective function has an additive structure and satisfies a higher-order smoothness property, characterized by the H"older family of functions.
arXiv Detail & Related papers (2025-03-03T23:39:08Z) - An Accelerated Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness [15.656614304616006]
This paper investigates a class of bilevel optimization problems where the upper-level function is non convex and the lower-level problem is strongly convex.<n>These problems have significant applications in data learning, such as text classification using unbounded networks.<n>We propose a new Accelerated Bilevel Optimization algorithm named AccBO.
arXiv Detail & Related papers (2024-09-28T02:30:44Z) - Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity [50.25258834153574]
We focus on the class of (strongly) convex $(L0)$-smooth functions and derive new convergence guarantees for several existing methods.<n>In particular, we derive improved convergence rates for Gradient Descent with smoothnessed Gradient Clipping and for Gradient Descent with Polyak Stepsizes.
arXiv Detail & Related papers (2024-09-23T13:11:37Z) - Fast Unconstrained Optimization via Hessian Averaging and Adaptive Gradient Sampling Methods [0.3222802562733786]
We consider minimizing finite-sum expectation objective functions via Hessian-averaging based subsampled Newton methods.
These methods allow for inexactness and have fixed per-it Hessian approximation costs.
We present novel analysis techniques and propose challenges for their practical implementation.
arXiv Detail & Related papers (2024-08-14T03:27:48Z) - An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes [17.804065824245402]
In machine learning applications, each loss function is non-negative and can be expressed as the composition of a square and its real-valued square root.
We show how to apply the Gauss-Newton method or the Levssquardt method to minimize the average of smooth but possibly non-negative functions.
arXiv Detail & Related papers (2024-07-05T08:53:06Z) - Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - Adaptive extra-gradient methods for min-max optimization and games [35.02879452114223]
We present a new family of minmax optimization algorithms that automatically exploit the geometry of the gradient data observed at earlier iterations.
Thanks to this adaptation mechanism, the proposed method automatically detects whether the problem is smooth or not.
It converges to an $varepsilon$-optimal solution within $mathcalO (1/varepsilon)$ iterations in smooth problems, and within $mathcalO (1/varepsilon)$ iterations in non-smooth ones.
arXiv Detail & Related papers (2020-10-22T22:54:54Z) - Exploiting Higher Order Smoothness in Derivative-free Optimization and
Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function.
We consider a randomized approximation of the projected gradient descent algorithm.
Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.