Cyclic Block Coordinate Descent With Variance Reduction for Composite
Nonconvex Optimization
- URL: http://arxiv.org/abs/2212.05088v1
- Date: Fri, 9 Dec 2022 19:17:39 GMT
- Title: Cyclic Block Coordinate Descent With Variance Reduction for Composite
Nonconvex Optimization
- Authors: Xufeng Cai, Chaobing Song, Stephen J. Wright, Jelena Diakonikolas
- Abstract summary: We propose methods for solving problems coordinate non-asymptotic gradient norm guarantees.
Our results demonstrate the efficacy of the proposed cyclic-reduced scheme in training deep neural nets.
- Score: 26.218670461973705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nonconvex optimization is central in solving many machine learning problems,
in which block-wise structure is commonly encountered. In this work, we propose
cyclic block coordinate methods for nonconvex optimization problems with
non-asymptotic gradient norm guarantees. Our convergence analysis is based on a
gradient Lipschitz condition with respect to a Mahalanobis norm, inspired by a
recent progress on cyclic block coordinate methods. In deterministic settings,
our convergence guarantee matches the guarantee of (full-gradient) gradient
descent, but with the gradient Lipschitz constant being defined w.r.t.~the
Mahalanobis norm. In stochastic settings, we use recursive variance reduction
to decrease the per-iteration cost and match the arithmetic operation
complexity of current optimal stochastic full-gradient methods, with a unified
analysis for both finite-sum and infinite-sum cases. We further prove the
faster, linear convergence of our methods when a Polyak-{\L}ojasiewicz (P{\L})
condition holds for the objective function. To the best of our knowledge, our
work is the first to provide variance-reduced convergence guarantees for a
cyclic block coordinate method. Our experimental results demonstrate the
efficacy of the proposed variance-reduced cyclic scheme in training deep neural
nets.
Related papers
- An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes [17.804065824245402]
In machine learning applications, each loss function is non-negative and can be expressed as the composition of a square and its real-valued square root.
We show how to apply the Gauss-Newton method or the Levssquardt method to minimize the average of smooth but possibly non-negative functions.
arXiv Detail & Related papers (2024-07-05T08:53:06Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - An Accelerated Block Proximal Framework with Adaptive Momentum for
Nonconvex and Nonsmooth Optimization [2.323238724742687]
We propose an accelerated block proximal linear framework with adaptive momentum (ABPL$+$) for nonsmooth and nonsmooth optimization.
We analyze the potential causes of the extrapolation step failing in some algorithms, and resolve this issue by enhancing the comparison process.
We extend our algorithm to any scenario involving updating the gradient step and the linear extrapolation step.
arXiv Detail & Related papers (2023-08-23T13:32:31Z) - Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach [57.92727189589498]
We propose an online convex optimization approach with two different levels of adaptivity.
We obtain $mathcalO(log V_T)$, $mathcalO(d log V_T)$ and $hatmathcalO(sqrtV_T)$ regret bounds for strongly convex, exp-concave and convex loss functions.
arXiv Detail & Related papers (2023-07-17T09:55:35Z) - A conditional gradient homotopy method with applications to Semidefinite
Programming [1.6369790794838281]
homotopy-based conditional gradient method for solving convex optimization problems with a large number of simple conic constraints.
Our theoretical complexity is competitive when confronted to state-of-the-art SDP, with the decisive advantage of cheap projection-frees.
arXiv Detail & Related papers (2022-07-07T05:48:27Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Optimal Rates for Random Order Online Optimization [60.011653053877126]
We study the citetgarber 2020online, where the loss functions may be chosen by an adversary, but are then presented online in a uniformly random order.
We show that citetgarber 2020online algorithms achieve the optimal bounds and significantly improve their stability.
arXiv Detail & Related papers (2021-06-29T09:48:46Z) - Cyclic Coordinate Dual Averaging with Extrapolation [22.234715500748074]
We introduce a new block coordinate method that applies to the general class of variational inequality (VI) problems with monotone operators.
The resulting convergence bounds match the optimal convergence bounds of full gradient methods.
For $m$ coordinate blocks, the resulting gradient Lipschitz constant in our bounds is never larger than a factor $sqrtm$ compared to the traditional Euclidean Lipschitz constant.
arXiv Detail & Related papers (2021-02-26T00:28:58Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Provably Convergent Working Set Algorithm for Non-Convex Regularized
Regression [0.0]
This paper proposes a working set algorithm for non-regular regularizers with convergence guarantees.
Our results demonstrate high gain compared to the full problem solver for both block-coordinates or a gradient solver.
arXiv Detail & Related papers (2020-06-24T07:40:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.