Related papers: A Unified Theory of Decentralized SGD with Changing Topology and Local Updates

A Unified Theory of Decentralized SGD with Changing Topology and Local Updates

URL: http://arxiv.org/abs/2003.10422v3
Date: Tue, 2 Mar 2021 14:07:36 GMT
Title: A Unified Theory of Decentralized SGD with Changing Topology and Local Updates
Authors: Anastasia Koloskova, Nicolas Loizou, Sadra Boreiri, Martin Jaggi, Sebastian U. Stich
Abstract summary: We introduce a unified convergence analysis of decentralized communication methods. We derive universal convergence rates for several applications. Our proofs rely on weak assumptions.
Score: 70.9701218475002
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Decentralized stochastic optimization methods have gained a lot of attention recently, mainly because of their cheap per iteration cost, data locality, and their communication-efficiency. In this paper we introduce a unified convergence analysis that covers a large variety of decentralized SGD methods which so far have required different intuitions, have different applications, and which have been developed separately in various communities. Our algorithmic framework covers local SGD updates and synchronous and pairwise gossip updates on adaptive network topology. We derive universal convergence rates for smooth (convex and non-convex) problems and the rates interpolate between the heterogeneous (non-identically distributed data) and iid-data settings, recovering linear convergence rates in many special cases, for instance for over-parametrized models. Our proofs rely on weak assumptions (typically improving over prior work in several aspects) and recover (and improve) the best known complexity results for a host of important scenarios, such as for instance coorperative SGD and federated averaging (local SGD).

Related papers

What Makes Local Updates Effective: The Role of Data Heterogeneity and Smoothness [5.357435119431715]
The thesis contributes to a self-contained guide for analyzing Local SGD in heterogeneous environments.<n>The thesis also extends to online learning, providing fundamental bounds under both first-order and bandit feedback.
arXiv Detail & Related papers (2025-06-30T19:06:02Z)
Stability and Generalization for Distributed SGDA [70.97400503482353]
We propose the stability-based generalization analytical framework for Distributed-SGDA. We conduct a comprehensive analysis of stability error, generalization gap, and population risk across different metrics. Our theoretical results reveal the trade-off between the generalization gap and optimization error.
arXiv Detail & Related papers (2024-11-14T11:16:32Z)
Stability and Generalization of the Decentralized Stochastic Gradient Descent Ascent Algorithm [80.94861441583275]
We investigate the complexity of the generalization bound of the decentralized gradient descent (D-SGDA) algorithm. Our results analyze the impact of different top factors on the generalization of D-SGDA. We also balance it with the generalization to obtain the optimal convex-concave setting.
arXiv Detail & Related papers (2023-10-31T11:27:01Z)
Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods [73.35353358543507]
Gradient Descent-Ascent (SGDA) is one of the most prominent algorithms for solving min-max optimization and variational inequalities problems (VIP) In this paper, we propose a unified convergence analysis that covers a large variety of descent-ascent methods. We develop several new variants of SGDA such as a new variance-reduced method (L-SVRGDA), new distributed methods with compression (QSGDA, DIANA-SGDA, VR-DIANA-SGDA), and a new method with coordinate randomization (SEGA-SGDA)
arXiv Detail & Related papers (2022-02-15T09:17:39Z)
Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning [58.79085525115987]
Local methods are one of the promising approaches to reduce communication time. We show that the communication complexity is better than non-local methods when the local datasets is smaller than the smoothness local loss.
arXiv Detail & Related papers (2022-02-12T15:12:17Z)
On the Convergence of Heterogeneous Federated Learning with Arbitrary Adaptive Online Model Pruning [15.300983585090794]
We present a unifying framework for heterogeneous FL algorithms with em arbitrary adaptive online model pruning. In particular, we prove that under certain sufficient conditions, these algorithms converge to a stationary point of standard FL for general smooth cost functions. We illuminate two key factors impacting convergence: pruning-induced noise and minimum coverage index.
arXiv Detail & Related papers (2022-01-27T20:43:38Z)
Decentralized Local Stochastic Extra-Gradient for Variational Inequalities [125.62877849447729]
We consider distributed variational inequalities (VIs) on domains with the problem data that is heterogeneous (non-IID) and distributed across many devices. We make a very general assumption on the computational network that covers the settings of fully decentralized calculations. We theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone settings.
arXiv Detail & Related papers (2021-06-15T17:45:51Z)
Local Stochastic Gradient Descent Ascent: Convergence Analysis and Communication Efficiency [15.04034188283642]
Local SGD is a promising approach to overcome the communication overhead in distributed learning. We show that local SGDA can provably optimize distributed minimax problems in both homogeneous and heterogeneous data.
arXiv Detail & Related papers (2021-02-25T20:15:18Z)
Local SGD: Unified Theory and New Efficient Methods [8.701566919381223]
We present a unified framework for analyzing local SGD methods in the convex and strongly convex regimes. We develop the first linearly converging local SGD method which does not require any data homogeneity or other strong assumptions.
arXiv Detail & Related papers (2020-11-03T13:02:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.