Shadowheart SGD: Distributed Asynchronous SGD with Optimal Time Complexity Under Arbitrary Computation and Communication Heterogeneity
- URL: http://arxiv.org/abs/2402.04785v2
- Date: Sat, 02 Nov 2024 15:31:03 GMT
- Title: Shadowheart SGD: Distributed Asynchronous SGD with Optimal Time Complexity Under Arbitrary Computation and Communication Heterogeneity
- Authors: Alexander Tyurin, Marta Pozzi, Ivan Ilin, Peter Richtárik,
- Abstract summary: We develop a new method-Shadowheart SGD-that provably improves the time complexities of all previous centralized methods.
We also consider the bidirectional setup, where broadcasting from the server to the workers is non-negligible, and develop a corresponding method.
- Score: 85.92481138826949
- License:
- Abstract: We consider nonconvex stochastic optimization problems in the asynchronous centralized distributed setup where the communication times from workers to a server can not be ignored, and the computation and communication times are potentially different for all workers. Using an unbiassed compression technique, we develop a new method-Shadowheart SGD-that provably improves the time complexities of all previous centralized methods. Moreover, we show that the time complexity of Shadowheart SGD is optimal in the family of centralized methods with compressed communication. We also consider the bidirectional setup, where broadcasting from the server to the workers is non-negligible, and develop a corresponding method.
Related papers
- Federated Contextual Cascading Bandits with Asynchronous Communication
and Heterogeneous Users [95.77678166036561]
We propose a UCB-type algorithm with delicate communication protocols.
We give sub-linear regret bounds on par with those achieved in the synchronous framework.
Empirical evaluation on synthetic and real-world datasets validates our algorithm's superior performance in terms of regrets and communication costs.
arXiv Detail & Related papers (2024-02-26T05:31:14Z) - Asynchronous SGD on Graphs: a Unified Framework for Asynchronous
Decentralized and Federated Optimization [13.119144971868632]
We introduce Asynchronous SGD on Graphs (AGRAF SGD) -- a general algorithmic framework that covers asynchronous versions of many popular algorithms.
We provide rates of convergence under much milder assumptions than previous decentralized asynchronous computation works.
arXiv Detail & Related papers (2023-11-01T11:58:16Z) - Serverless Federated AUPRC Optimization for Multi-Party Collaborative
Imbalanced Data Mining [119.89373423433804]
Area Under Precision-Recall (AUPRC) was introduced as an effective metric.
Serverless multi-party collaborative training can cut down the communications cost by avoiding the server node bottleneck.
We propose a new ServerLess biAsed sTochastic gradiEnt (SLATE) algorithm to directly optimize the AUPRC.
arXiv Detail & Related papers (2023-08-06T06:51:32Z) - $\textbf{A}^2\textbf{CiD}^2$: Accelerating Asynchronous Communication in
Decentralized Deep Learning [0.0]
We introduce a principled asynchronous, randomized, gossip-based optimization algorithm which works thanks to a continuous local momentum named $textbfA2textbfCiD2$.
Our theoretical analysis proves accelerated rates compared to previous asynchronous decentralized baselines.
We show consistent improvement on the ImageNet dataset using up to 64 asynchronous workers.
arXiv Detail & Related papers (2023-06-14T06:52:07Z) - Locally Asynchronous Stochastic Gradient Descent for Decentralised Deep
Learning [0.0]
Local Asynchronous SGD (LASGD) is an asynchronous decentralized algorithm that relies on All Reduce for model synchronization.
We empirically validate LASGD's performance on image classification tasks on the ImageNet dataset.
arXiv Detail & Related papers (2022-03-24T14:25:15Z) - Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD
for Communication Efficient Nonconvex Distributed Learning [58.79085525115987]
Local methods are one of the promising approaches to reduce communication time.
We show that the communication complexity is better than non-local methods when the local datasets is smaller than the smoothness local loss.
arXiv Detail & Related papers (2022-02-12T15:12:17Z) - A Linearly Convergent Algorithm for Decentralized Optimization: Sending
Less Bits for Free! [72.31332210635524]
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator.
We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators.
We prove that our method can solve the problems without any increase in the number of communications compared to the baseline.
arXiv Detail & Related papers (2020-11-03T13:35:53Z) - A Low Complexity Decentralized Neural Net with Centralized Equivalence
using Layer-wise Learning [49.15799302636519]
We design a low complexity decentralized learning algorithm to train a recently proposed large neural network in distributed processing nodes (workers)
In our setup, the training data is distributed among the workers but is not shared in the training process due to privacy and security concerns.
We show that it is possible to achieve equivalent learning performance as if the data is available in a single place.
arXiv Detail & Related papers (2020-09-29T13:08:12Z) - Advances in Asynchronous Parallel and Distributed Optimization [11.438194383787604]
Asynchronous methods do not require all processors to maintain a consistent view of the optimization variables.
They are not sensitive to issues like stragglers (i.e., slow nodes) and unreliable communication links.
This article reviews recent developments in the design and analysis of asynchronous optimization methods.
arXiv Detail & Related papers (2020-06-24T16:10:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.