Cyclic Coordinate Dual Averaging with Extrapolation
- URL: http://arxiv.org/abs/2102.13244v4
- Date: Thu, 8 Jun 2023 16:24:22 GMT
- Title: Cyclic Coordinate Dual Averaging with Extrapolation
- Authors: Chaobing Song and Jelena Diakonikolas
- Abstract summary: We introduce a new block coordinate method that applies to the general class of variational inequality (VI) problems with monotone operators.
The resulting convergence bounds match the optimal convergence bounds of full gradient methods.
For $m$ coordinate blocks, the resulting gradient Lipschitz constant in our bounds is never larger than a factor $sqrtm$ compared to the traditional Euclidean Lipschitz constant.
- Score: 22.234715500748074
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cyclic block coordinate methods are a fundamental class of optimization
methods widely used in practice and implemented as part of standard software
packages for statistical learning. Nevertheless, their convergence is generally
not well understood and so far their good practical performance has not been
explained by existing convergence analyses. In this work, we introduce a new
block coordinate method that applies to the general class of variational
inequality (VI) problems with monotone operators. This class includes composite
convex optimization problems and convex-concave min-max optimization problems
as special cases and has not been addressed by the existing work. The resulting
convergence bounds match the optimal convergence bounds of full gradient
methods, but are provided in terms of a novel gradient Lipschitz condition
w.r.t.~a Mahalanobis norm. For $m$ coordinate blocks, the resulting gradient
Lipschitz constant in our bounds is never larger than a factor $\sqrt{m}$
compared to the traditional Euclidean Lipschitz constant, while it is possible
for it to be much smaller. Further, for the case when the operator in the VI
has finite-sum structure, we propose a variance reduced variant of our method
which further decreases the per-iteration cost and has better convergence rates
in certain regimes. To obtain these results, we use a gradient extrapolation
strategy that allows us to view a cyclic collection of block coordinate-wise
gradients as one implicit gradient.
Related papers
- An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes [17.804065824245402]
In machine learning applications, each loss function is non-negative and can be expressed as the composition of a square and its real-valued square root.
We show how to apply the Gauss-Newton method or the Levssquardt method to minimize the average of smooth but possibly non-negative functions.
arXiv Detail & Related papers (2024-07-05T08:53:06Z) - Convex and Non-convex Optimization Under Generalized Smoothness [69.69521650503431]
An analysis of convex and non- optimization methods often requires the Lipsitzness gradient, which limits the analysis by this trajectorys.
Recent work generalizes the gradient setting via the non-uniform smoothness condition.
arXiv Detail & Related papers (2023-06-02T04:21:59Z) - High-Probability Bounds for Stochastic Optimization and Variational
Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions.
These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z) - Cyclic Block Coordinate Descent With Variance Reduction for Composite
Nonconvex Optimization [26.218670461973705]
We propose methods for solving problems coordinate non-asymptotic gradient norm guarantees.
Our results demonstrate the efficacy of the proposed cyclic-reduced scheme in training deep neural nets.
arXiv Detail & Related papers (2022-12-09T19:17:39Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Regret Bounds without Lipschitz Continuity: Online Learning with
Relative-Lipschitz Losses [19.554559253046225]
In online convex optimization (OCO), Lipschitz continuity of the functions is commonly assumed in order to obtain sublinear regret.
In this work, we consider OCO for relative Lipschitz and relative strongly convex functions.
We show regret bounds for the follow the regularized leader algorithms and a variant of online mirror descent.
arXiv Detail & Related papers (2020-10-22T20:22:19Z) - Conditional gradient methods for stochastically constrained convex
minimization [54.53786593679331]
We propose two novel conditional gradient-based methods for solving structured convex optimization problems.
The most important feature of our framework is that only a subset of the constraints is processed at each iteration.
Our algorithms rely on variance reduction and smoothing used in conjunction with conditional gradient steps, and are accompanied by rigorous convergence guarantees.
arXiv Detail & Related papers (2020-07-07T21:26:35Z) - Conditional Gradient Methods for Convex Optimization with General Affine
and Nonlinear Constraints [8.643249539674612]
This paper presents new conditional gradient methods for solving convex optimization problems with general affine and nonlinear constraints.
We first present a new constraint extrapolated condition gradient (CoexCG) method that can achieve an $cal O (1/epsilon2)$ iteration complexity for both smooth and structured nonsmooth function constrained convex optimization.
We further develop novel variants of CoexCG, namely constraint extrapolated and dual regularized conditional gradient (CoexDurCG) methods, that can achieve similar iteration complexity to CoexCG but allow adaptive selection for algorithmic parameters.
arXiv Detail & Related papers (2020-06-30T23:49:38Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - Explicit Regularization of Stochastic Gradient Methods through Duality [9.131027490864938]
We propose randomized Dykstra-style algorithms based on randomized dual coordinate ascent.
For accelerated coordinate descent, we obtain a new algorithm that has better convergence properties than existing gradient methods in the interpolating regime.
arXiv Detail & Related papers (2020-03-30T20:44:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.