Related papers: Anytime MiniBatch: Exploiting Stragglers in Online Distributed Optimization

Anytime MiniBatch: Exploiting Stragglers in Online Distributed Optimization

URL: http://arxiv.org/abs/2006.05752v1
Date: Wed, 10 Jun 2020 09:53:02 GMT
Title: Anytime MiniBatch: Exploiting Stragglers in Online Distributed Optimization
Authors: Nuwan Ferdinand, Haider Al-Lawati, Stark C. Draper and Matthew Nokleby
Abstract summary: We propose an online distributed optimization method called Anytime Minibatch. All nodes are given a fixed time to compute the gradients of as many data samples as possible. Our results show that our approach is up to 1.5 times faster in Amazon EC2.
Score: 16.361894089347278
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Distributed optimization is vital in solving large-scale machine learning problems. A widely-shared feature of distributed optimization techniques is the requirement that all nodes complete their assigned tasks in each computational epoch before the system can proceed to the next epoch. In such settings, slow nodes, called stragglers, can greatly slow progress. To mitigate the impact of stragglers, we propose an online distributed optimization method called Anytime Minibatch. In this approach, all nodes are given a fixed time to compute the gradients of as many data samples as possible. The result is a variable per-node minibatch size. Workers then get a fixed communication time to average their minibatch gradients via several rounds of consensus, which are then used to update primal variables via dual averaging. Anytime Minibatch prevents stragglers from holding up the system without wasting the work that stragglers can complete. We present a convergence analysis and analyze the wall time performance. Our numerical results show that our approach is up to 1.5 times faster in Amazon EC2 and it is up to five times faster when there is greater variability in compute node performance.

Related papers

Cooperative Minibatching in Graph Neural Networks [1.534667887016089]
We propose a new approach called Cooperative Minibatching to reduce the effects of Neighborhood Explosion Phenomenon (NEP) We show how to take advantage of the same phenomenon in serial execution by generating dependent consecutive minibatches. We achieve up to 64% speedup over Independent Minibatching on single-node multi- GPU systems.
arXiv Detail & Related papers (2023-10-19T01:15:24Z)
Distributed Extra-gradient with Optimal Complexity and Communication Guarantees [60.571030754252824]
We consider monotone variational inequality (VI) problems in multi-GPU settings where multiple processors/workers/clients have access to local dual vectors. Extra-gradient, which is a de facto algorithm for monotone VI problems, has not been designed to be communication-efficient. We propose a quantized generalized extra-gradient (Q-GenX), which is an unbiased and adaptive compression method tailored to solve VIs.
arXiv Detail & Related papers (2023-08-17T21:15:04Z)
Fast and Straggler-Tolerant Distributed SGD with Reduced Computation Load [11.069252535469644]
optimization procedures like gradient descent (SGD) can be leveraged to mitigate the effect of unresponsive or slow workers called stragglers. This can be done by only waiting for a subset of the workers to finish their computation at each iteration of the algorithm. We construct a novel scheme that adapts both the number of workers and the computation load throughout the run-time of the algorithm.
arXiv Detail & Related papers (2023-04-17T20:12:18Z)
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching. Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z)
Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning [77.22019100456595]
We show a training algorithm for distributed computation workers with varying communication frequency. In this work, we obtain a tighter convergence rate of $mathcalO!!!(sigma2-2_avg!! . We also show that the heterogeneity term in rate is affected by the average delay within each worker.
arXiv Detail & Related papers (2022-06-16T17:10:57Z)
Asynchronous Stochastic Optimization Robust to Arbitrary Delays [54.61797739710608]
We consider optimization with delayed gradients where, at each time stept$, the algorithm makes an update using a stale computation - d_t$ for arbitrary delay $d_t gradient. Our experiments demonstrate the efficacy and robustness of our algorithm in cases where the delay distribution is skewed or heavy-tailed.
arXiv Detail & Related papers (2021-06-22T15:50:45Z)
Straggler-Resilient Distributed Machine Learning with Dynamic Backup Workers [9.919012793724628]
We propose a fully distributed algorithm to determine the number of backup workers for each worker. Our algorithm achieves a linear speedup for convergence (i.e., convergence performance increases linearly with respect to the number of workers)
arXiv Detail & Related papers (2021-02-11T21:39:53Z)
Stochastic Optimization with Laggard Data Pipelines [65.20044914532221]
We show that "dataechoed" extensions of common optimization methods exhibit provable improvements over their synchronous counterparts. Specifically, we show that in convex optimization with minibatches, data echoing affords speedups on the curvature-dominated part of the convergence rate, while maintaining the optimal statistical rate.
arXiv Detail & Related papers (2020-10-26T14:55:31Z)
Joint Parameter-and-Bandwidth Allocation for Improving the Efficiency of Partitioned Edge Learning [73.82875010696849]
Machine learning algorithms are deployed at the network edge for training artificial intelligence (AI) models. This paper focuses on the novel joint design of parameter (computation load) allocation and bandwidth allocation.
arXiv Detail & Related papers (2020-03-10T05:52:15Z)
Fully Asynchronous Policy Evaluation in Distributed Reinforcement Learning over Networks [14.636457985379746]
This paper proposes a emphfully asynchronous scheme for the policy evaluation problem of distributed reinforcement learning (DisRL) over directed peer-to-peer networks. Without waiting for any other node of the network, each node can locally update its value function at any time by using (possibly delayed) information from its neighbors.
arXiv Detail & Related papers (2020-03-01T08:12:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.