A Unified Analysis of Stochastic Gradient Methods for Nonconvex
Federated Optimization
- URL: http://arxiv.org/abs/2006.07013v1
- Date: Fri, 12 Jun 2020 08:58:03 GMT
- Title: A Unified Analysis of Stochastic Gradient Methods for Nonconvex
Federated Optimization
- Authors: Zhize Li, Peter Richt\'arik
- Abstract summary: We provide a single analysis for all methods that satisfy the SGD variants in the non non regime.
We also provide a unified analysis for obtaining faster linear convergence in the non non regime under PL condition.
- Score: 16.714109768541785
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study the performance of a large family of SGD variants in
the smooth nonconvex regime. To this end, we propose a generic and flexible
assumption capable of accurate modeling of the second moment of the stochastic
gradient. Our assumption is satisfied by a large number of specific variants of
SGD in the literature, including SGD with arbitrary sampling, SGD with
compressed gradients, and a wide variety of variance-reduced SGD methods such
as SVRG and SAGA. We provide a single convergence analysis for all methods that
satisfy the proposed unified assumption, thereby offering a unified
understanding of SGD variants in the nonconvex regime instead of relying on
dedicated analyses of each variant. Moreover, our unified analysis is accurate
enough to recover or improve upon the best-known convergence results of several
classical methods, and also gives new convergence results for many new methods
which arise as special cases. In the more general distributed/federated
nonconvex optimization setup, we propose two new general algorithmic frameworks
differing in whether direct gradient compression (DC) or compression of
gradient differences (DIANA) is used. We show that all methods captured by
these two frameworks also satisfy our unified assumption. Thus, our unified
convergence analysis also captures a large variety of distributed methods
utilizing compressed communication. Finally, we also provide a unified analysis
for obtaining faster linear convergence rates in this nonconvex regime under
the PL condition.
Related papers
- A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning.
This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z) - A Unified Convergence Theorem for Stochastic Optimization Methods [4.94128206910124]
We provide a fundamental unified convergence theorem used for deriving convergence results for a series of unified optimization methods.
As a direct application, we recover almost sure convergence results under general settings.
arXiv Detail & Related papers (2022-06-08T14:01:42Z) - Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient
Methods [73.35353358543507]
Gradient Descent-Ascent (SGDA) is one of the most prominent algorithms for solving min-max optimization and variational inequalities problems (VIP)
In this paper, we propose a unified convergence analysis that covers a large variety of descent-ascent methods.
We develop several new variants of SGDA such as a new variance-reduced method (L-SVRGDA), new distributed methods with compression (QSGDA, DIANA-SGDA, VR-DIANA-SGDA), and a new method with coordinate randomization (SEGA-SGDA)
arXiv Detail & Related papers (2022-02-15T09:17:39Z) - Last-Iterate Convergence of Saddle-Point Optimizers via High-Resolution
Differential Equations [83.3201889218775]
Several widely-used first-order saddle-point optimization methods yield an identical continuous-time ordinary differential equation (ODE) when derived naively.
However, the convergence properties of these methods are qualitatively different, even on simple bilinear games.
We adopt a framework studied in fluid dynamics to design differential equation models for several saddle-point optimization methods.
arXiv Detail & Related papers (2021-12-27T18:31:34Z) - Stochastic Extragradient: General Analysis and Improved Rates [23.29653334470774]
The Extragradient (SEG) method is one of the most popular algorithms for solving min-max optimization and variational inequalities problems.
We develop a novel theoretical framework that allows us to analyze several variants of SEG in a unified manner.
Our rates for the new variants of SEG outperform the current state-of-the-art convergence guarantees and rely on less restrictive assumptions.
arXiv Detail & Related papers (2021-11-16T16:49:31Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z) - A general sample complexity analysis of vanilla policy gradient [101.16957584135767]
Policy gradient (PG) is one of the most popular reinforcement learning (RL) problems.
"vanilla" theoretical understanding of PG trajectory is one of the most popular methods for solving RL problems.
arXiv Detail & Related papers (2021-07-23T19:38:17Z) - Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth
Games: Convergence Analysis under Expected Co-coercivity [49.66890309455787]
We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO.
We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size.
Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
arXiv Detail & Related papers (2021-06-30T18:32:46Z) - Linearly Converging Error Compensated SGD [11.436753102510647]
We propose a unified analysis of variants of distributed SGD with arbitrary compressions and delayed updates.
Our framework is general enough to cover different variants of quantized SGD, ErrorCompensated SGD and SGD with delayed updates.
We develop new variants of SGD that combine variance reduction or arbitrary sampling with error feedback and quantization.
arXiv Detail & Related papers (2020-10-23T10:46:31Z) - Sampling and Update Frequencies in Proximal Variance-Reduced Stochastic
Gradient Methods [0.0]
We present a general proximal variance-reduced gradient method and analyze it under strong convexity assumptions.
Special cases of the algorithm include SAGA, L-SVRG and their proximal variants.
arXiv Detail & Related papers (2020-02-13T14:56:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.