Communication-Efficient Gradient Coding for Straggler Mitigation in
Distributed Learning
- URL: http://arxiv.org/abs/2005.07184v1
- Date: Thu, 14 May 2020 17:57:13 GMT
- Title: Communication-Efficient Gradient Coding for Straggler Mitigation in
Distributed Learning
- Authors: Swanand Kadhe, O. Ozan Koyluoglu, and Kannan Ramchandran
- Abstract summary: Distributed implementations of gradient-based methods, wherein a server distributes gradient computations across worker machines, need to overcome two limitations.
Ye and Abbe [ICML 2018] proposed a coding-theoretic paradigm to characterize a fundamental trade-off between computation load per worker, communication overhead per worker, and straggler tolerance.
We develop a communication-efficient gradient coding framework to overcome these drawbacks.
- Score: 17.454251607446555
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Distributed implementations of gradient-based methods, wherein a server
distributes gradient computations across worker machines, need to overcome two
limitations: delays caused by slow running machines called 'stragglers', and
communication overheads. Recently, Ye and Abbe [ICML 2018] proposed a
coding-theoretic paradigm to characterize a fundamental trade-off between
computation load per worker, communication overhead per worker, and straggler
tolerance. However, their proposed coding schemes suffer from heavy decoding
complexity and poor numerical stability. In this paper, we develop a
communication-efficient gradient coding framework to overcome these drawbacks.
Our proposed framework enables using any linear code to design the encoding and
decoding functions. When a particular code is used in this framework, its
block-length determines the computation load, dimension determines the
communication overhead, and minimum distance determines the straggler
tolerance. The flexibility of choosing a code allows us to gracefully trade-off
the straggler threshold and communication overhead for smaller decoding
complexity and higher numerical stability. Further, we show that using a
maximum distance separable (MDS) code generated by a random Gaussian matrix in
our framework yields a gradient code that is optimal with respect to the
trade-off and, in addition, satisfies stronger guarantees on numerical
stability as compared to the previously proposed schemes. Finally, we evaluate
our proposed framework on Amazon EC2 and demonstrate that it reduces the
average iteration time by 16% as compared to prior gradient coding schemes.
Related papers
- Accelerating Error Correction Code Transformers [56.75773430667148]
We introduce a novel acceleration method for transformer-based decoders.
We achieve a 90% compression ratio and reduce arithmetic operation energy consumption by at least 224 times on modern hardware.
arXiv Detail & Related papers (2024-10-08T11:07:55Z) - Factor Graph Optimization of Error-Correcting Codes for Belief Propagation Decoding [62.25533750469467]
Low-Density Parity-Check (LDPC) codes possess several advantages over other families of codes.
The proposed approach is shown to outperform the decoding performance of existing popular codes by orders of magnitude.
arXiv Detail & Related papers (2024-06-09T12:08:56Z) - Practical Conformer: Optimizing size, speed and flops of Conformer for
on-Device and cloud ASR [67.63332492134332]
We design an optimized conformer that is small enough to meet on-device restrictions and has fast inference on TPUs.
Our proposed encoder can double as a strong standalone encoder in on device, and as the first part of a high-performance ASR pipeline.
arXiv Detail & Related papers (2023-03-31T23:30:48Z) - Nested Gradient Codes for Straggler Mitigation in Distributed Machine
Learning [21.319460501659666]
Gradient codes are designed to tolerate a fixed number of stragglers.
We propose a gradient coding scheme that can tolerate a flexible number of stragglers.
By proper task scheduling and small additional signaling, our scheme adapts the load of the workers to the actual number of stragglers.
arXiv Detail & Related papers (2022-12-16T16:56:51Z) - Optimization-based Block Coordinate Gradient Coding for Mitigating
Partial Stragglers in Distributed Learning [58.91954425047425]
This paper aims to design a new gradient coding scheme for mitigating partial stragglers in distributed learning.
We propose a gradient coordinate coding scheme with L coding parameters representing L possibly different diversities for the L coordinates, which generates most gradient coding schemes.
arXiv Detail & Related papers (2022-06-06T09:25:40Z) - Lightweight Projective Derivative Codes for Compressed Asynchronous
Gradient Descent [6.055286666916789]
This paper proposes a novel algorithm that encodes the partial derivatives themselves and furthermore optimize the codes by performing lossy compression on the derivative codewords.
The utility of this application of coding theory is a geometrical consequence of the observed fact in optimization research that noise is tolerable, sometimes even helpful, in gradient descent based learning algorithms.
arXiv Detail & Related papers (2022-01-31T04:08:53Z) - Distributed Learning and Democratic Embeddings: Polynomial-Time Source
Coding Schemes Can Achieve Minimax Lower Bounds for Distributed Gradient
Descent under Communication Constraints [46.17631511884969]
We consider the problem of compressing a vector in the n-dimensional Euclidean space, subject to a bit-budget of R-bits per dimension.
We show that Democratic and Near-Democratic source-coding schemes are (near) optimal in the sense that the covering efficiency of the resulting quantizer is either dimension independent, or has a very weak logarithmic dependence.
We propose a distributed optimization algorithm: DGD-DEF, which employs our proposed coding strategy, and achieves the minimax optimal convergence rate to within (near) constant factors.
arXiv Detail & Related papers (2021-03-13T00:04:11Z) - Approximate Gradient Coding with Optimal Decoding [22.069731881164895]
In distributed optimization problems, a technique called gradient coding has been used to mitigate the effect of straggling machines.
Recent work has studied approximate gradient coding, which concerns coding schemes where the replication factor of the data is too low to recover the full gradient exactly.
We introduce novel approximate gradient codes based on expander graphs, in which each machine receives exactly two blocks of data points.
arXiv Detail & Related papers (2020-06-17T03:46:27Z) - Straggler-aware Distributed Learning: Communication Computation Latency
Trade-off [56.08535873173518]
Straggling workers can be tolerated by assigning redundant computations and coding across data and computations.
In most existing schemes, each non-straggling worker transmits one message per iteration to the parameter server (PS) after completing all its computations.
Imposing such a limitation results in two main drawbacks; over-computation due to inaccurate prediction of the straggling behaviour, and under-utilization due to treating workers as straggler/non-straggler.
arXiv Detail & Related papers (2020-04-10T08:39:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.