Optimization-based Block Coordinate Gradient Coding for Mitigating
Partial Stragglers in Distributed Learning
- URL: http://arxiv.org/abs/2206.02450v1
- Date: Mon, 6 Jun 2022 09:25:40 GMT
- Title: Optimization-based Block Coordinate Gradient Coding for Mitigating
Partial Stragglers in Distributed Learning
- Authors: Qi Wang, Ying Cui, Chenglin Li, Junni Zou, Hongkai Xiong
- Abstract summary: This paper aims to design a new gradient coding scheme for mitigating partial stragglers in distributed learning.
We propose a gradient coordinate coding scheme with L coding parameters representing L possibly different diversities for the L coordinates, which generates most gradient coding schemes.
- Score: 58.91954425047425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gradient coding schemes effectively mitigate full stragglers in distributed
learning by introducing identical redundancy in coded local partial derivatives
corresponding to all model parameters. However, they are no longer effective
for partial stragglers as they cannot utilize incomplete computation results
from partial stragglers. This paper aims to design a new gradient coding scheme
for mitigating partial stragglers in distributed learning. Specifically, we
consider a distributed system consisting of one master and N workers,
characterized by a general partial straggler model and focuses on solving a
general large-scale machine learning problem with L model parameters using
gradient coding. First, we propose a coordinate gradient coding scheme with L
coding parameters representing L possibly different diversities for the L
coordinates, which generates most gradient coding schemes. Then, we consider
the minimization of the expected overall runtime and the maximization of the
completion probability with respect to the L coding parameters for coordinates,
which are challenging discrete optimization problems. To reduce computational
complexity, we first transform each to an equivalent but much simpler discrete
problem with N\llL variables representing the partition of the L coordinates
into N blocks, each with identical redundancy. This indicates an equivalent but
more easily implemented block coordinate gradient coding scheme with N coding
parameters for blocks. Then, we adopt continuous relaxation to further reduce
computational complexity. For the resulting minimization of expected overall
runtime, we develop an iterative algorithm of computational complexity O(N^2)
to obtain an optimal solution and derive two closed-form approximate solutions
both with computational complexity O(N). For the resultant maximization of the
completion probability, we develop an iterative algorithm of...
Related papers
- Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications.
This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z) - Versatile Single-Loop Method for Gradient Estimator: First and Second
Order Optimality, and its Application to Federated Learning [45.78238792836363]
We present a single-loop algorithm named SLEDGE (Single-Loop-E Gradient Estimator) for periodic convergence.
Unlike existing methods, SLEDGE has the advantage of versatility; (ii) second-order optimal, (ii) in the PL region, and (iii) smaller complexity under less of data.
arXiv Detail & Related papers (2022-09-01T11:05:26Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Efficient Global Optimization of Two-layer ReLU Networks: Quadratic-time
Algorithms and Adversarial Training [12.354076490479516]
We develop two efficient algorithms that train ANNs with global convergence guarantees.
The first algorithm is based on the alternating method multiplier (ADMM)
The second algorithm, based on the "sampled convex programs" theory, is simpler to implement.
arXiv Detail & Related papers (2022-01-06T08:24:11Z) - Optimization-Based GenQSGD for Federated Edge Learning [12.371264770814097]
We present a generalized parallel mini-batch convergence descent (SGD) algorithm for federated learning (FL)
We optimize the algorithm parameters to minimize the energy cost under the time convergence error.
Results demonstrate the significant gains over existing FL algorithms.
arXiv Detail & Related papers (2021-10-25T14:25:11Z) - Parallel Quasi-concave set optimization: A new frontier that scales
without needing submodularity [14.93584434176082]
Class of quasi-concave set functions induced as a dual class to monotone linkage functions.
We show a potential for widespread applications via an example of diverse feature subset selection with exact global maxi-min guarantees.
arXiv Detail & Related papers (2021-08-19T15:50:41Z) - Block majorization-minimization with diminishing radius for constrained
nonconvex optimization [9.907540661545328]
Block tensor regularization-minimization (BMM) is a simple iterative algorithm for non constrained optimization that minimizes major surrogates in each block.
We show that BMM can produce a gradient $O(epsilon-2(logepsilon-1)2)$ when convex surrogates are used.
arXiv Detail & Related papers (2020-12-07T07:53:09Z) - Coded Distributed Computing with Partial Recovery [56.08535873173518]
We introduce a novel coded matrix-vector multiplication scheme, called coded computation with partial recovery (CCPR)
CCPR reduces both the computation time and the decoding complexity by allowing a trade-off between the accuracy and the speed of computation.
We then extend this approach to distributed implementation of more general computation tasks by proposing a coded communication scheme with partial recovery.
arXiv Detail & Related papers (2020-07-04T21:34:49Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.