Related papers: Linearly Converging Error Compensated SGD

Linearly Converging Error Compensated SGD

URL: http://arxiv.org/abs/2010.12292v1
Date: Fri, 23 Oct 2020 10:46:31 GMT
Title: Linearly Converging Error Compensated SGD
Authors: Eduard Gorbunov, Dmitry Kovalev, Dmitry Makarenko, Peter Richt\'arik
Abstract summary: We propose a unified analysis of variants of distributed SGD with arbitrary compressions and delayed updates. Our framework is general enough to cover different variants of quantized SGD, ErrorCompensated SGD and SGD with delayed updates. We develop new variants of SGD that combine variance reduction or arbitrary sampling with error feedback and quantization.
Score: 11.436753102510647
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we propose a unified analysis of variants of distributed SGD with arbitrary compressions and delayed updates. Our framework is general enough to cover different variants of quantized SGD, Error-Compensated SGD (EC-SGD) and SGD with delayed updates (D-SGD). Via a single theorem, we derive the complexity results for all the methods that fit our framework. For the existing methods, this theorem gives the best-known complexity results. Moreover, using our general scheme, we develop new variants of SGD that combine variance reduction or arbitrary sampling with error feedback and quantization and derive the convergence rates for these methods beating the state-of-the-art results. In order to illustrate the strength of our framework, we develop 16 new methods that fit this. In particular, we propose the first method called EC-SGD-DIANA that is based on error-feedback for biased compression operator and quantization of gradient differences and prove the convergence guarantees showing that EC-SGD-DIANA converges to the exact optimum asymptotically in expectation with constant learning rate for both convex and strongly convex objectives when workers compute full gradients of their loss functions. Moreover, for the case when the loss function of the worker has the form of finite sum, we modified the method and got a new one called EC-LSVRG-DIANA which is the first distributed stochastic method with error feedback and variance reduction that converges to the exact optimum asymptotically in expectation with a constant learning rate.

Related papers

Variance-Reduced Fast Operator Splitting Methods for Stochastic Generalized Equations [8.0153031008486]
We introduce a class of variance-reduced estimators and establish their variance-reduction bounds. Next, we design a novel accelerated variance-reduced forward-backward splitting (FBS) algorithm. Our method achieves both $mathcalO (1/k2)$ and $o (1/k2)$ convergence rates on the expected norm.
arXiv Detail & Related papers (2025-04-17T16:02:20Z)
Effect of Random Learning Rate: Theoretical Analysis of SGD Dynamics in Non-Convex Optimization via Stationary Distribution [6.144680854063938]
We consider a variant of the gradient descent (SGD) with a random learning rate to reveal its convergence properties. We demonstrate that a distribution of a parameter updated by Poisson SGD converges to a stationary distribution under weak assumptions.
arXiv Detail & Related papers (2024-06-23T06:52:33Z)
Benign Underfitting of Stochastic Gradient Descent [72.38051710389732]
We study to what extent may gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit training data. We analyze the closely related with-replacement SGD, for which an analogous phenomenon does not occur and prove that its population risk does in fact converge at the optimal rate.
arXiv Detail & Related papers (2022-02-27T13:25:01Z)
Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods [73.35353358543507]
Gradient Descent-Ascent (SGDA) is one of the most prominent algorithms for solving min-max optimization and variational inequalities problems (VIP) In this paper, we propose a unified convergence analysis that covers a large variety of descent-ascent methods. We develop several new variants of SGDA such as a new variance-reduced method (L-SVRGDA), new distributed methods with compression (QSGDA, DIANA-SGDA, VR-DIANA-SGDA), and a new method with coordinate randomization (SEGA-SGDA)
arXiv Detail & Related papers (2022-02-15T09:17:39Z)
On the Convergence of mSGD and AdaGrad for Stochastic Optimization [0.696125353550498]
convex descent (SGD) has been intensively developed and extensively applied in machine learning in the past decade. Some modified SGD-type algorithms, which outperform the SGD in many competitions and applications in terms of convergence rate and accuracy, such as momentum-based SGD (mSGD) and adaptive gradient optimization (AdaGrad) We focus on convergence analysis of mSGD and AdaGrad for any smooth (possibly non-possibly non-possibly non-possibly) loss functions in machine learning.
arXiv Detail & Related papers (2022-01-26T22:02:21Z)
Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement. For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts. We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z)
On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD) We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting. We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z)
Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity [49.66890309455787]
We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO. We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size. Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
arXiv Detail & Related papers (2021-06-30T18:32:46Z)
Unified Analysis of Stochastic Gradient Methods for Composite Convex and Smooth Optimization [15.82816385434718]
We present a unified theorem for the convergence analysis of gradient algorithms for minimizing a smooth and convex loss plus a convex regularizer. We do this by extending the unified analysis of Gorbunov, Hanzely & Richt'arik ( 2020) and dropping the requirement that the loss function be strongly convex. Our unified analysis applies to a host of existing algorithms such as proximal SGD, variance reduced methods, quantization and some coordinate descent type methods.
arXiv Detail & Related papers (2020-06-20T13:40:27Z)
A Unified Analysis of Stochastic Gradient Methods for Nonconvex Federated Optimization [16.714109768541785]
We provide a single analysis for all methods that satisfy the SGD variants in the non non regime. We also provide a unified analysis for obtaining faster linear convergence in the non non regime under PL condition.
arXiv Detail & Related papers (2020-06-12T08:58:03Z)
Detached Error Feedback for Distributed SGD with Random Sparsification [98.98236187442258]
Communication bottleneck has been a critical problem in large-scale deep learning. We propose a new distributed error feedback (DEF) algorithm, which shows better convergence than error feedback for non-efficient distributed problems. We also propose DEFA to accelerate the generalization of DEF, which shows better bounds than DEF.
arXiv Detail & Related papers (2020-04-11T03:50:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.