Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in
the Presence of Stragglers
- URL: http://arxiv.org/abs/2002.11005v1
- Date: Tue, 25 Feb 2020 16:25:22 GMT
- Title: Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in
the Presence of Stragglers
- Authors: Serge Kas Hanna, Rawad Bitar, Parimal Parag, Venkat Dasari, and Salim
El Rouayheb
- Abstract summary: We consider the setting where a master wants to run a distributed gradient descent (SGD) algorithm on $n$ workers each having a subset of the data.
Distributed SGD may suffer from the effect of stragglers, i.e., slow or unresponsive workers who cause delays.
We propose an algorithm for adaptive distributed SGD that is based on a statistical notion.
- Score: 31.309253646602933
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the setting where a master wants to run a distributed stochastic
gradient descent (SGD) algorithm on $n$ workers each having a subset of the
data. Distributed SGD may suffer from the effect of stragglers, i.e., slow or
unresponsive workers who cause delays. One solution studied in the literature
is to wait at each iteration for the responses of the fastest $k<n$ workers
before updating the model, where $k$ is a fixed parameter. The choice of the
value of $k$ presents a trade-off between the runtime (i.e., convergence rate)
of SGD and the error of the model. Towards optimizing the error-runtime
trade-off, we investigate distributed SGD with adaptive $k$. We first design an
adaptive policy for varying $k$ that optimizes this trade-off based on an upper
bound on the error as a function of the wall-clock time which we derive. Then,
we propose an algorithm for adaptive distributed SGD that is based on a
statistical heuristic. We implement our algorithm and provide numerical
simulations which confirm our intuition and theoretical analysis.
Related papers
- DASA: Delay-Adaptive Multi-Agent Stochastic Approximation [64.32538247395627]
We consider a setting in which $N$ agents aim to speedup a common Approximation problem by acting in parallel and communicating with a central server.
To mitigate the effect of delays and stragglers, we propose textttDASA, a Delay-Adaptive algorithm for multi-agent Approximation.
arXiv Detail & Related papers (2024-03-25T22:49:56Z) - Towards Understanding the Generalizability of Delayed Stochastic
Gradient Descent [63.43247232708004]
A gradient descent performed in an asynchronous manner plays a crucial role in training large-scale machine learning models.
Existing generalization error bounds are rather pessimistic and cannot reveal the correlation between asynchronous delays and generalization.
Our theoretical results indicate that asynchronous delays reduce the generalization error of the delayed SGD algorithm.
arXiv Detail & Related papers (2023-08-18T10:00:27Z) - Federated Learning Using Variance Reduced Stochastic Gradient for
Probabilistically Activated Agents [0.0]
This paper proposes an algorithm for Federated Learning (FL) with a two-layer structure that achieves both variance reduction and a faster convergence rate to an optimal solution in the setting where each agent has an arbitrary probability of selection in each iteration.
arXiv Detail & Related papers (2022-10-25T22:04:49Z) - DR-DSGD: A Distributionally Robust Decentralized Learning Algorithm over
Graphs [54.08445874064361]
We propose to solve a regularized distributionally robust learning problem in the decentralized setting.
By adding a Kullback-Liebler regularization function to the robust min-max optimization problem, the learning problem can be reduced to a modified robust problem.
We show that our proposed algorithm can improve the worst distribution test accuracy by up to $10%$.
arXiv Detail & Related papers (2022-08-29T18:01:42Z) - Adaptive Stochastic Gradient Descent for Fast and
Communication-Efficient Distributed Learning [33.590006101071765]
We consider the setting where a master wants to run a distributed descent (SGD) algorithm on $n$ workers.
We show that the adaptive version of distributed SGD can reach lower error values in less time compared to non-adaptive implementations.
arXiv Detail & Related papers (2022-08-04T10:57:25Z) - Sharper Convergence Guarantees for Asynchronous SGD for Distributed and
Federated Learning [77.22019100456595]
We show a training algorithm for distributed computation workers with varying communication frequency.
In this work, we obtain a tighter convergence rate of $mathcalO!!!(sigma2-2_avg!! .
We also show that the heterogeneity term in rate is affected by the average delay within each worker.
arXiv Detail & Related papers (2022-06-16T17:10:57Z) - $\bar{G}_{mst}$:An Unbiased Stratified Statistic and a Fast Gradient
Optimization Algorithm Based on It [0.0]
A novel algorithm named MSSG designed based on $barG_mst$ outperforms other sgd-like algorithms.
arXiv Detail & Related papers (2021-10-07T11:48:55Z) - Asynchronous Stochastic Optimization Robust to Arbitrary Delays [54.61797739710608]
We consider optimization with delayed gradients where, at each time stept$, the algorithm makes an update using a stale computation - d_t$ for arbitrary delay $d_t gradient.
Our experiments demonstrate the efficacy and robustness of our algorithm in cases where the delay distribution is skewed or heavy-tailed.
arXiv Detail & Related papers (2021-06-22T15:50:45Z) - Private Stochastic Non-Convex Optimization: Adaptive Algorithms and
Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization.
We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.