Related papers: Asynchronous SGD on Graphs: a Unified Framework for Asynchronous Decentralized and Federated Optimization

Asynchronous SGD on Graphs: a Unified Framework for Asynchronous Decentralized and Federated Optimization

URL: http://arxiv.org/abs/2311.00465v1
Date: Wed, 1 Nov 2023 11:58:16 GMT
Title: Asynchronous SGD on Graphs: a Unified Framework for Asynchronous Decentralized and Federated Optimization
Authors: Mathieu Even, Anastasia Koloskova, Laurent Massouli\'e
Abstract summary: We introduce Asynchronous SGD on Graphs (AGRAF SGD) -- a general algorithmic framework that covers asynchronous versions of many popular algorithms. We provide rates of convergence under much milder assumptions than previous decentralized asynchronous computation works.
Score: 13.119144971868632
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Decentralized and asynchronous communications are two popular techniques to speedup communication complexity of distributed machine learning, by respectively removing the dependency over a central orchestrator and the need for synchronization. Yet, combining these two techniques together still remains a challenge. In this paper, we take a step in this direction and introduce Asynchronous SGD on Graphs (AGRAF SGD) -- a general algorithmic framework that covers asynchronous versions of many popular algorithms including SGD, Decentralized SGD, Local SGD, FedBuff, thanks to its relaxed communication and computation assumptions. We provide rates of convergence under much milder assumptions than previous decentralized asynchronous works, while still recovering or even improving over the best know results for all the algorithms covered.

Related papers

Ringmaster ASGD: The First Asynchronous SGD with Optimal Time Complexity [92.1840862558718]
Ringmaster ASGD achieves optimal time complexity under arbitrarily heterogeneous computation times. This makes it the first Asynchronous SGD method to meet the theoretical lower bounds for time complexity in such scenarios.
arXiv Detail & Related papers (2025-01-27T16:07:26Z)
Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration [66.43954501171292]
We introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata. DFedCata consists of two main components: the Moreau envelope function, which addresses parameter inconsistencies, and Nesterov's extrapolation step, which accelerates the aggregation phase. Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions.
arXiv Detail & Related papers (2024-10-09T06:17:16Z)
Dual-Delayed Asynchronous SGD for Arbitrarily Heterogeneous Data [22.917944307972434]
We consider the distributed learning problem with data across multiple workers under the orchestration of a central server. We introduce the textitdual-delayed asynchronous SGD (DuDe-ASGD) algorithm designed to the adverse effects of data iterations. DuDe-ASGD makes full use of stale gradients from all workers during asynchronous training, leading to two time lags in the model parameters and data samples utilized in the servers.
arXiv Detail & Related papers (2024-05-27T09:00:30Z)
Federated Contextual Cascading Bandits with Asynchronous Communication and Heterogeneous Users [95.77678166036561]
We propose a UCB-type algorithm with delicate communication protocols. We give sub-linear regret bounds on par with those achieved in the synchronous framework. Empirical evaluation on synthetic and real-world datasets validates our algorithm's superior performance in terms of regrets and communication costs.
arXiv Detail & Related papers (2024-02-26T05:31:14Z)
Shadowheart SGD: Distributed Asynchronous SGD with Optimal Time Complexity Under Arbitrary Computation and Communication Heterogeneity [85.92481138826949]
We develop a new method-Shadowheart SGD-that provably improves the time complexities of all previous centralized methods. We also consider the bidirectional setup, where broadcasting from the server to the workers is non-negligible, and develop a corresponding method.
arXiv Detail & Related papers (2024-02-07T12:15:56Z)
Asynchronous Distributed Optimization with Delay-free Parameters [9.062164411594175]
This paper develops asynchronous versions of two distributed algorithms, Prox-DGD and DGD-ATC, for solving consensus optimization problems over undirected networks. In contrast to alternatives, our algorithms can converge to the fixed point set of their synchronous counterparts using step-sizes that are independent of the delays.
arXiv Detail & Related papers (2023-12-11T16:33:38Z)
AsGrad: A Sharp Unified Analysis of Asynchronous-SGD Algorithms [45.90015262911875]
We analyze asynchronous-type algorithms for distributed SGD in the heterogeneous setting. As a by-product of our analysis, we also demonstrate guarantees for gradient-type algorithms such as SGD with random tightness.
arXiv Detail & Related papers (2023-10-31T13:44:53Z)
Locally Asynchronous Stochastic Gradient Descent for Decentralised Deep Learning [0.0]
Local Asynchronous SGD (LASGD) is an asynchronous decentralized algorithm that relies on All Reduce for model synchronization. We empirically validate LASGD's performance on image classification tasks on the ImageNet dataset.
arXiv Detail & Related papers (2022-03-24T14:25:15Z)
Decentralized Optimization with Heterogeneous Delays: a Continuous-Time Approach [6.187780920448871]
We propose a novel continuous-time framework to analyze asynchronous algorithms. We describe a fully asynchronous decentralized algorithm to minimize the sum of smooth and strongly convex functions.
arXiv Detail & Related papers (2021-06-07T13:09:25Z)
Asynchronous Decentralized Learning of a Neural Network [49.15799302636519]
We exploit an asynchronous computing framework namely ARock to learn a deep neural network called self-size estimating feedforward neural network (SSFN) in a decentralized scenario. Asynchronous decentralized SSFN relaxes the communication bottleneck by allowing one node activation and one side communication, which reduces the communication overhead significantly. We compare asynchronous dSSFN with traditional synchronous dSSFN in the experimental results, which shows the competitive performance of asynchronous dSSFN, especially when the communication network is sparse.
arXiv Detail & Related papers (2020-04-10T15:53:37Z)
A Unified Theory of Decentralized SGD with Changing Topology and Local Updates [70.9701218475002]
We introduce a unified convergence analysis of decentralized communication methods. We derive universal convergence rates for several applications. Our proofs rely on weak assumptions.
arXiv Detail & Related papers (2020-03-23T17:49:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.