Age-Based Coded Computation for Bias Reduction in Distributed Learning
- URL: http://arxiv.org/abs/2006.01816v1
- Date: Tue, 2 Jun 2020 17:51:11 GMT
- Title: Age-Based Coded Computation for Bias Reduction in Distributed Learning
- Authors: Emre Ozfatura and Baturalp Buyukates and Deniz Gunduz and Sennur
Ulukus
- Abstract summary: Coded computation can be used to speed up distributed learning in the presence of straggling workers.
Partial recovery of the gradient vector can further reduce the computation time at each iteration.
Estimator bias will be particularly prevalent when the straggling behavior is correlated over time.
- Score: 57.9123881133818
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Coded computation can be used to speed up distributed learning in the
presence of straggling workers. Partial recovery of the gradient vector can
further reduce the computation time at each iteration; however, this can result
in biased estimators, which may slow down convergence, or even cause
divergence. Estimator bias will be particularly prevalent when the straggling
behavior is correlated over time, which results in the gradient estimators
being dominated by a few fast servers. To mitigate biased estimators, we design
a $timely$ dynamic encoding framework for partial recovery that includes an
ordering operator that changes the codewords and computation orders at workers
over time. To regulate the recovery frequencies, we adopt an $age$ metric in
the design of the dynamic encoding scheme. We show through numerical results
that the proposed dynamic encoding strategy increases the timeliness of the
recovered computations, which as a result, reduces the bias in model updates,
and accelerates the convergence compared to the conventional static partial
recovery schemes.
Related papers
- A Mirror Descent-Based Algorithm for Corruption-Tolerant Distributed Gradient Descent [57.64826450787237]
We show how to analyze the behavior of distributed gradient descent algorithms in the presence of adversarial corruptions.
We show how to use ideas from (lazy) mirror descent to design a corruption-tolerant distributed optimization algorithm.
Experiments based on linear regression, support vector classification, and softmax classification on the MNIST dataset corroborate our theoretical findings.
arXiv Detail & Related papers (2024-07-19T08:29:12Z) - ReCycle: Fast and Efficient Long Time Series Forecasting with Residual Cyclic Transformers [0.06965384453064827]
Residual Cyclic Transformer, ReCycle, bridges the gap between high method complexity and realistic computational resources.
Our approach reduces the run time and energy consumption by more than an order of magnitude, making both training and inference feasible on low-performance, low-power and edge computing devices.
arXiv Detail & Related papers (2024-05-06T12:48:34Z) - Randomized Polar Codes for Anytime Distributed Machine Learning [66.46612460837147]
We present a novel distributed computing framework that is robust to slow compute nodes, and is capable of both approximate and exact computation of linear operations.
We propose a sequential decoding algorithm designed to handle real valued data while maintaining low computational complexity for recovery.
We demonstrate the potential applications of this framework in various contexts, such as large-scale matrix multiplication and black-box optimization.
arXiv Detail & Related papers (2023-09-01T18:02:04Z) - Fast and Straggler-Tolerant Distributed SGD with Reduced Computation
Load [11.069252535469644]
optimization procedures like gradient descent (SGD) can be leveraged to mitigate the effect of unresponsive or slow workers called stragglers.
This can be done by only waiting for a subset of the workers to finish their computation at each iteration of the algorithm.
We construct a novel scheme that adapts both the number of workers and the computation load throughout the run-time of the algorithm.
arXiv Detail & Related papers (2023-04-17T20:12:18Z) - Nested Gradient Codes for Straggler Mitigation in Distributed Machine
Learning [21.319460501659666]
Gradient codes are designed to tolerate a fixed number of stragglers.
We propose a gradient coding scheme that can tolerate a flexible number of stragglers.
By proper task scheduling and small additional signaling, our scheme adapts the load of the workers to the actual number of stragglers.
arXiv Detail & Related papers (2022-12-16T16:56:51Z) - Loop Unrolled Shallow Equilibrium Regularizer (LUSER) -- A
Memory-Efficient Inverse Problem Solver [26.87738024952936]
In inverse problems we aim to reconstruct some underlying signal of interest from potentially corrupted and often ill-posed measurements.
We propose an LU algorithm with shallow equilibrium regularizers (L)
These implicit models are as expressive as deeper convolutional networks, but far more memory efficient during training.
arXiv Detail & Related papers (2022-10-10T19:50:37Z) - Gradient Coding with Dynamic Clustering for Straggler Mitigation [57.9123881133818]
GC-DC regulates the number of straggling workers in each cluster based on the straggler behavior in the previous iteration.
We numerically show that GC-DC provides significant improvements in the average completion time (of each iteration) with no increase in the communication load compared to the original GC scheme.
arXiv Detail & Related papers (2020-11-03T18:52:15Z) - Coded Distributed Computing with Partial Recovery [56.08535873173518]
We introduce a novel coded matrix-vector multiplication scheme, called coded computation with partial recovery (CCPR)
CCPR reduces both the computation time and the decoding complexity by allowing a trade-off between the accuracy and the speed of computation.
We then extend this approach to distributed implementation of more general computation tasks by proposing a coded communication scheme with partial recovery.
arXiv Detail & Related papers (2020-07-04T21:34:49Z) - Accelerated Convergence for Counterfactual Learning to Rank [65.63997193915257]
We show that convergence rate of SGD approaches with IPS-weighted gradients suffers from the large variance introduced by the IPS weights.
We propose a novel learning algorithm, called CounterSample, that has provably better convergence than standard IPS-weighted gradient descent methods.
We prove that CounterSample converges faster and complement our theoretical findings with empirical results.
arXiv Detail & Related papers (2020-05-21T12:53:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.