Relative Lipschitzness in Extragradient Methods and a Direct Recipe for
Acceleration
- URL: http://arxiv.org/abs/2011.06572v2
- Date: Thu, 15 Jul 2021 03:01:08 GMT
- Title: Relative Lipschitzness in Extragradient Methods and a Direct Recipe for
Acceleration
- Authors: Michael B. Cohen, Aaron Sidford, Kevin Tian
- Abstract summary: We show that standard extragradient methods (i.e. mirror prox and dual extrapolation) recover optimal accelerated rates for first-order minimization of smooth convex functions.
We generalize this framework to handle local and randomized notions of relative Lipschitzness.
- Score: 30.369542816161378
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We show that standard extragradient methods (i.e. mirror prox and dual
extrapolation) recover optimal accelerated rates for first-order minimization
of smooth convex functions. To obtain this result we provide a fine-grained
characterization of the convergence rates of extragradient methods for solving
monotone variational inequalities in terms of a natural condition we call
relative Lipschitzness. We further generalize this framework to handle local
and randomized notions of relative Lipschitzness and thereby recover rates for
box-constrained $\ell_\infty$ regression based on area convexity and complexity
bounds achieved by accelerated (randomized) coordinate descent for smooth
convex function minimization.
Related papers
- Revisiting Convergence: Shuffling Complexity Beyond Lipschitz Smoothness [50.78508362183774]
Shuffling-type gradient methods are favored in practice for their simplicity and rapid empirical performance.<n>Most require the Lipschitz condition, which is often not met in common machine learning schemes.
arXiv Detail & Related papers (2025-07-11T15:36:48Z) - Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$-Smoothness [51.302674884611335]
This work introduces a hybrid non-Euclidean optimization method which generalizes norm clipping by combining steepest descent and conditional gradient approaches.<n>We discuss how to instantiate the algorithms for deep learning and demonstrate their properties on image classification and language modeling.
arXiv Detail & Related papers (2025-06-02T17:34:29Z) - Mirror Descent Under Generalized Smoothness [23.5387392871236]
We introduce a new $ell*$-smoothness concept that measures the norm of Hessian terms of a general norm and its dual.
We establish convergence for mirror-descent-type algorithms, matching the rates under the classic smoothness.
arXiv Detail & Related papers (2025-02-02T11:23:10Z) - Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity [50.25258834153574]
We focus on the class of (strongly) convex $(L0)$-smooth functions and derive new convergence guarantees for several existing methods.
In particular, we derive improved convergence rates for Gradient Descent with smoothnessed Gradient Clipping and for Gradient Descent with Polyak Stepsizes.
arXiv Detail & Related papers (2024-09-23T13:11:37Z) - Gradient-Variation Online Learning under Generalized Smoothness [56.38427425920781]
gradient-variation online learning aims to achieve regret guarantees that scale with variations in gradients of online functions.
Recent efforts in neural network optimization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms.
We provide the applications for fast-rate convergence in games and extended adversarial optimization.
arXiv Detail & Related papers (2024-08-17T02:22:08Z) - Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - Functional Partial Least-Squares: Optimal Rates and Adaptation [0.0]
We propose a new formulation of the functional partial least-squares (PLS) estimator related to the conjugate gradient method.
We show that the estimator achieves the (nearly) optimal convergence rate on a class of ellipsoids.
arXiv Detail & Related papers (2024-02-16T23:47:47Z) - Incremental Quasi-Newton Methods with Faster Superlinear Convergence
Rates [50.36933471975506]
We consider the finite-sum optimization problem, where each component function is strongly convex and has Lipschitz continuous gradient and Hessian.
The recently proposed incremental quasi-Newton method is based on BFGS update and achieves a local superlinear convergence rate.
This paper proposes a more efficient quasi-Newton method by incorporating the symmetric rank-1 update into the incremental framework.
arXiv Detail & Related papers (2024-02-04T05:54:51Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Convex and Non-convex Optimization Under Generalized Smoothness [69.69521650503431]
An analysis of convex and non- optimization methods often requires the Lipsitzness gradient, which limits the analysis by this trajectorys.
Recent work generalizes the gradient setting via the non-uniform smoothness condition.
arXiv Detail & Related papers (2023-06-02T04:21:59Z) - Cyclic Block Coordinate Descent With Variance Reduction for Composite
Nonconvex Optimization [26.218670461973705]
We propose methods for solving problems coordinate non-asymptotic gradient norm guarantees.
Our results demonstrate the efficacy of the proposed cyclic-reduced scheme in training deep neural nets.
arXiv Detail & Related papers (2022-12-09T19:17:39Z) - Proximal Subgradient Norm Minimization of ISTA and FISTA [8.261388753972234]
We show that the squared proximal subgradient norm for the class of iterative shrinkage-thresholding algorithms converges at an inverse square rate.
We also show that the squared proximal subgradient norm for the class of faster iterative shrinkage-thresholding algorithms (FISTA) is accelerated to convergence at an inverse cubic rate.
arXiv Detail & Related papers (2022-11-03T06:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.