Gradient Coding with Dynamic Clustering for Straggler Mitigation
- URL: http://arxiv.org/abs/2011.01922v1
- Date: Tue, 3 Nov 2020 18:52:15 GMT
- Title: Gradient Coding with Dynamic Clustering for Straggler Mitigation
- Authors: Baturalp Buyukates and Emre Ozfatura and Sennur Ulukus and Deniz
Gunduz
- Abstract summary: GC-DC regulates the number of straggling workers in each cluster based on the straggler behavior in the previous iteration.
We numerically show that GC-DC provides significant improvements in the average completion time (of each iteration) with no increase in the communication load compared to the original GC scheme.
- Score: 57.9123881133818
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In distributed synchronous gradient descent (GD) the main performance
bottleneck for the per-iteration completion time is the slowest
\textit{straggling} workers. To speed up GD iterations in the presence of
stragglers, coded distributed computation techniques are implemented by
assigning redundant computations to workers. In this paper, we propose a novel
gradient coding (GC) scheme that utilizes dynamic clustering, denoted by GC-DC,
to speed up the gradient calculation. Under time-correlated straggling
behavior, GC-DC aims at regulating the number of straggling workers in each
cluster based on the straggler behavior in the previous iteration. We
numerically show that GC-DC provides significant improvements in the average
completion time (of each iteration) with no increase in the communication load
compared to the original GC scheme.
Related papers
- Rethinking and Accelerating Graph Condensation: A Training-Free Approach with Class Partition [56.26113670151363]
Graph condensation is a data-centric solution to replace the large graph with a small yet informative condensed graph.
Existing GC methods suffer from intricate optimization processes, necessitating excessive computing resources.
We propose a training-free GC framework termed Class-partitioned Graph Condensation (CGC)
CGC achieves state-of-the-art performance with a more efficient condensation process.
arXiv Detail & Related papers (2024-05-22T14:57:09Z) - ABS-SGD: A Delayed Synchronous Stochastic Gradient Descent Algorithm
with Adaptive Batch Size for Heterogeneous GPU Clusters [9.885668723959125]
We propose a delayed synchronous distributed gradient descent algorithm with adaptive batch size (ABS-SGD) for heterogeneous GPU clusters.
In ABS-SGD, workers perform global synchronization to accumulate delayed gradients and use the accumulated delayed gradients to update parameters.
Extensive experiments in three types of heterogeneous clusters demonstrate that ABS-SGD can make full use of computational resources.
arXiv Detail & Related papers (2023-08-29T09:46:52Z) - Sequential Gradient Coding For Straggler Mitigation [28.090458692750023]
In distributed computing, slower nodes (stragglers) usually become a bottleneck.
Gradient Coding (GC) is an efficient technique that uses principles of error-correcting codes to distribute gradient computation in the presence of stragglers.
We propose two schemes that demonstrate improved performance compared to GC.
arXiv Detail & Related papers (2022-11-24T21:12:49Z) - Gradient Coding with Dynamic Clustering for Straggler-Tolerant
Distributed Learning [55.052517095437]
gradient descent (GD) is widely employed to parallelize the learning task by distributing the dataset across multiple workers.
A significant performance bottleneck for the per-iteration completion time in distributed synchronous GD is $straggling$ workers.
Coded distributed techniques have been introduced recently to mitigate stragglers and to speed up GD iterations by assigning redundant computations to workers.
We propose a novel dynamic GC scheme, which assigns redundant data to workers to acquire the flexibility to choose from among a set of possible codes depending on the past straggling behavior.
arXiv Detail & Related papers (2021-03-01T18:51:29Z) - Sparse Communication for Training Deep Networks [56.441077560085475]
Synchronous gradient descent (SGD) is the most common method used for distributed training of deep learning models.
In this algorithm, each worker shares its local gradients with others and updates the parameters using the average gradients of all workers.
We study several compression schemes and identify how three key parameters affect the performance.
arXiv Detail & Related papers (2020-09-19T17:28:11Z) - Age-Based Coded Computation for Bias Reduction in Distributed Learning [57.9123881133818]
Coded computation can be used to speed up distributed learning in the presence of straggling workers.
Partial recovery of the gradient vector can further reduce the computation time at each iteration.
Estimator bias will be particularly prevalent when the straggling behavior is correlated over time.
arXiv Detail & Related papers (2020-06-02T17:51:11Z) - DaSGD: Squeezing SGD Parallelization Performance in Distributed Training
Using Delayed Averaging [4.652668321425679]
Minibatch gradient descent (SGD) algorithm requires workers to halt forward/back propagations.
DaSGD parallelizes SGD and forward/back propagations to hide 100% of the communication overhead.
arXiv Detail & Related papers (2020-05-31T05:43:50Z) - Gradient Centralization: A New Optimization Technique for Deep Neural
Networks [74.935141515523]
gradient centralization (GC) operates directly on gradients by centralizing the gradient vectors to have zero mean.
GC can be viewed as a projected gradient descent method with a constrained loss function.
GC is very simple to implement and can be easily embedded into existing gradient based DNNs with only one line of code.
arXiv Detail & Related papers (2020-04-03T10:25:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.