Relative Lipschitzness in Extragradient Methods and a Direct Recipe for
Acceleration
- URL: http://arxiv.org/abs/2011.06572v2
- Date: Thu, 15 Jul 2021 03:01:08 GMT
- Title: Relative Lipschitzness in Extragradient Methods and a Direct Recipe for
Acceleration
- Authors: Michael B. Cohen, Aaron Sidford, Kevin Tian
- Abstract summary: We show that standard extragradient methods (i.e. mirror prox and dual extrapolation) recover optimal accelerated rates for first-order minimization of smooth convex functions.
We generalize this framework to handle local and randomized notions of relative Lipschitzness.
- Score: 30.369542816161378
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We show that standard extragradient methods (i.e. mirror prox and dual
extrapolation) recover optimal accelerated rates for first-order minimization
of smooth convex functions. To obtain this result we provide a fine-grained
characterization of the convergence rates of extragradient methods for solving
monotone variational inequalities in terms of a natural condition we call
relative Lipschitzness. We further generalize this framework to handle local
and randomized notions of relative Lipschitzness and thereby recover rates for
box-constrained $\ell_\infty$ regression based on area convexity and complexity
bounds achieved by accelerated (randomized) coordinate descent for smooth
convex function minimization.
Related papers
- Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - Functional Partial Least-Squares: Optimal Rates and Adaptation [0.0]
We propose a new formulation of the functional partial least-squares (PLS) estimator related to the conjugate gradient method.
We show that the estimator achieves the (nearly) optimal convergence rate on a class of ellipsoids.
arXiv Detail & Related papers (2024-02-16T23:47:47Z) - Incremental Quasi-Newton Methods with Faster Superlinear Convergence
Rates [50.36933471975506]
We consider the finite-sum optimization problem, where each component function is strongly convex and has Lipschitz continuous gradient and Hessian.
The recently proposed incremental quasi-Newton method is based on BFGS update and achieves a local superlinear convergence rate.
This paper proposes a more efficient quasi-Newton method by incorporating the symmetric rank-1 update into the incremental framework.
arXiv Detail & Related papers (2024-02-04T05:54:51Z) - Optimal Guarantees for Algorithmic Reproducibility and Gradient
Complexity in Convex Optimization [55.115992622028685]
Previous work suggests that first-order methods would need to trade-off convergence rate (gradient convergence rate) for better.
We demonstrate that both optimal complexity and near-optimal convergence guarantees can be achieved for smooth convex minimization and smooth convex-concave minimax problems.
arXiv Detail & Related papers (2023-10-26T19:56:52Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Convex and Non-convex Optimization Under Generalized Smoothness [69.69521650503431]
An analysis of convex and non- optimization methods often requires the Lipsitzness gradient, which limits the analysis by this trajectorys.
Recent work generalizes the gradient setting via the non-uniform smoothness condition.
arXiv Detail & Related papers (2023-06-02T04:21:59Z) - Cyclic Block Coordinate Descent With Variance Reduction for Composite
Nonconvex Optimization [26.218670461973705]
We propose methods for solving problems coordinate non-asymptotic gradient norm guarantees.
Our results demonstrate the efficacy of the proposed cyclic-reduced scheme in training deep neural nets.
arXiv Detail & Related papers (2022-12-09T19:17:39Z) - Proximal Subgradient Norm Minimization of ISTA and FISTA [8.261388753972234]
We show that the squared proximal subgradient norm for the class of iterative shrinkage-thresholding algorithms converges at an inverse square rate.
We also show that the squared proximal subgradient norm for the class of faster iterative shrinkage-thresholding algorithms (FISTA) is accelerated to convergence at an inverse cubic rate.
arXiv Detail & Related papers (2022-11-03T06:50:19Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.