Local SGD: Unified Theory and New Efficient Methods
- URL: http://arxiv.org/abs/2011.02828v1
- Date: Tue, 3 Nov 2020 13:02:50 GMT
- Title: Local SGD: Unified Theory and New Efficient Methods
- Authors: Eduard Gorbunov, Filip Hanzely, Peter Richt\'arik
- Abstract summary: We present a unified framework for analyzing local SGD methods in the convex and strongly convex regimes.
We develop the first linearly converging local SGD method which does not require any data homogeneity or other strong assumptions.
- Score: 8.701566919381223
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a unified framework for analyzing local SGD methods in the convex
and strongly convex regimes for distributed/federated training of supervised
machine learning models. We recover several known methods as a special case of
our general framework, including Local-SGD/FedAvg, SCAFFOLD, and several
variants of SGD not originally designed for federated learning. Our framework
covers both the identical and heterogeneous data settings, supports both random
and deterministic number of local steps, and can work with a wide array of
local stochastic gradient estimators, including shifted estimators which are
able to adjust the fixed points of local iterations for faster convergence. As
an application of our framework, we develop multiple novel FL optimizers which
are superior to existing methods. In particular, we develop the first linearly
converging local SGD method which does not require any data homogeneity or
other strong assumptions.
Related papers
- FedHB: Hierarchical Bayesian Federated Learning [11.936836827864095]
We propose a novel hierarchical Bayesian approach to Federated Learning (FL)
Our model reasonably describes the generative process of clients' local data via hierarchical Bayesian modeling.
We show that our block-coordinate FL algorithm converges to an optimum of the objective at the rate of $O(sqrtt)$.
arXiv Detail & Related papers (2023-05-08T18:21:41Z) - SLowcal-SGD: Slow Query Points Improve Local-SGD for Stochastic Convex
Optimization [12.709177728330399]
We design the first local update method that provably benefits over the two most prominent distributed baselines: Minibatch-SGD and Local-SGD.
Key to our approach is a slow querying technique that we customize to the distributed setting, which in turn enables a better mitigation of the bias caused by local updates.
arXiv Detail & Related papers (2023-04-09T06:10:49Z) - Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient
Methods [73.35353358543507]
Gradient Descent-Ascent (SGDA) is one of the most prominent algorithms for solving min-max optimization and variational inequalities problems (VIP)
In this paper, we propose a unified convergence analysis that covers a large variety of descent-ascent methods.
We develop several new variants of SGDA such as a new variance-reduced method (L-SVRGDA), new distributed methods with compression (QSGDA, DIANA-SGDA, VR-DIANA-SGDA), and a new method with coordinate randomization (SEGA-SGDA)
arXiv Detail & Related papers (2022-02-15T09:17:39Z) - Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD
for Communication Efficient Nonconvex Distributed Learning [58.79085525115987]
Local methods are one of the promising approaches to reduce communication time.
We show that the communication complexity is better than non-local methods when the local datasets is smaller than the smoothness local loss.
arXiv Detail & Related papers (2022-02-12T15:12:17Z) - Distributed and Stochastic Optimization Methods with Gradient
Compression and Local Steps [0.0]
We propose theoretical frameworks for the analysis and distributed methods with error compensation and local updates.
We develop more than 20 new optimization methods, including the first linearly converging Error-pensated and first distributed Local-SGD methods.
arXiv Detail & Related papers (2021-12-20T16:12:54Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - Clustered Federated Learning via Generalized Total Variation
Minimization [83.26141667853057]
We study optimization methods to train local (or personalized) models for local datasets with a decentralized network structure.
Our main conceptual contribution is to formulate federated learning as total variation minimization (GTV)
Our main algorithmic contribution is a fully decentralized federated learning algorithm.
arXiv Detail & Related papers (2021-05-26T18:07:19Z) - FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity
to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data.
To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z) - From Local SGD to Local Fixed-Point Methods for Federated Learning [17.04886864943171]
We consider the generic problem of finding a fixed point of an average of operators, or an approximation thereof, in a distributed setting.
We investigate two strategies to achieve such a consensus: one based on a fixed number of local steps, and the other based on randomized computations.
arXiv Detail & Related papers (2020-04-03T09:24:10Z) - A Unified Theory of Decentralized SGD with Changing Topology and Local
Updates [70.9701218475002]
We introduce a unified convergence analysis of decentralized communication methods.
We derive universal convergence rates for several applications.
Our proofs rely on weak assumptions.
arXiv Detail & Related papers (2020-03-23T17:49:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.