Optimization-based Block Coordinate Gradient Coding for Mitigating
Partial Stragglers in Distributed Learning
- URL: http://arxiv.org/abs/2206.02450v1
- Date: Mon, 6 Jun 2022 09:25:40 GMT
- Title: Optimization-based Block Coordinate Gradient Coding for Mitigating
Partial Stragglers in Distributed Learning
- Authors: Qi Wang, Ying Cui, Chenglin Li, Junni Zou, Hongkai Xiong
- Abstract summary: This paper aims to design a new gradient coding scheme for mitigating partial stragglers in distributed learning.
We propose a gradient coordinate coding scheme with L coding parameters representing L possibly different diversities for the L coordinates, which generates most gradient coding schemes.
- Score: 58.91954425047425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gradient coding schemes effectively mitigate full stragglers in distributed
learning by introducing identical redundancy in coded local partial derivatives
corresponding to all model parameters. However, they are no longer effective
for partial stragglers as they cannot utilize incomplete computation results
from partial stragglers. This paper aims to design a new gradient coding scheme
for mitigating partial stragglers in distributed learning. Specifically, we
consider a distributed system consisting of one master and N workers,
characterized by a general partial straggler model and focuses on solving a
general large-scale machine learning problem with L model parameters using
gradient coding. First, we propose a coordinate gradient coding scheme with L
coding parameters representing L possibly different diversities for the L
coordinates, which generates most gradient coding schemes. Then, we consider
the minimization of the expected overall runtime and the maximization of the
completion probability with respect to the L coding parameters for coordinates,
which are challenging discrete optimization problems. To reduce computational
complexity, we first transform each to an equivalent but much simpler discrete
problem with N\llL variables representing the partition of the L coordinates
into N blocks, each with identical redundancy. This indicates an equivalent but
more easily implemented block coordinate gradient coding scheme with N coding
parameters for blocks. Then, we adopt continuous relaxation to further reduce
computational complexity. For the resulting minimization of expected overall
runtime, we develop an iterative algorithm of computational complexity O(N^2)
to obtain an optimal solution and derive two closed-form approximate solutions
both with computational complexity O(N). For the resultant maximization of the
completion probability, we develop an iterative algorithm of...
Related papers
- Learning to Optimize for Mixed-Integer Non-linear Programming [20.469394148261838]
Mixed-integer non-NLP programs (MINLPs) arise in various domains, such as energy systems and transportation, but are notoriously difficult to solve.
Recent advances in machine learning have led to remarkable successes in optimization, area broadly known as learning to optimize.
We propose two differentiable correction layers that generate integer outputs while preserving gradient.
arXiv Detail & Related papers (2024-10-14T20:14:39Z) - Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms [65.42376001308064]
We propose two variance reduced ZO estimators for complex gradient problems.
We improve the state-of-the-art function complexities from $mathcalOleft(minfracdn1/2epsilon2, fracdepsilon3right)$ to $tildecalOleft(fracdepsilon2right)$.
arXiv Detail & Related papers (2024-10-03T15:04:01Z) - Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications.
This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z) - Versatile Single-Loop Method for Gradient Estimator: First and Second
Order Optimality, and its Application to Federated Learning [45.78238792836363]
We present a single-loop algorithm named SLEDGE (Single-Loop-E Gradient Estimator) for periodic convergence.
Unlike existing methods, SLEDGE has the advantage of versatility; (ii) second-order optimal, (ii) in the PL region, and (iii) smaller complexity under less of data.
arXiv Detail & Related papers (2022-09-01T11:05:26Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Optimization-Based GenQSGD for Federated Edge Learning [12.371264770814097]
We present a generalized parallel mini-batch convergence descent (SGD) algorithm for federated learning (FL)
We optimize the algorithm parameters to minimize the energy cost under the time convergence error.
Results demonstrate the significant gains over existing FL algorithms.
arXiv Detail & Related papers (2021-10-25T14:25:11Z) - Parallel Quasi-concave set optimization: A new frontier that scales
without needing submodularity [14.93584434176082]
Class of quasi-concave set functions induced as a dual class to monotone linkage functions.
We show a potential for widespread applications via an example of diverse feature subset selection with exact global maxi-min guarantees.
arXiv Detail & Related papers (2021-08-19T15:50:41Z) - Block majorization-minimization with diminishing radius for constrained
nonconvex optimization [9.907540661545328]
Block tensor regularization-minimization (BMM) is a simple iterative algorithm for non constrained optimization that minimizes major surrogates in each block.
We show that BMM can produce a gradient $O(epsilon-2(logepsilon-1)2)$ when convex surrogates are used.
arXiv Detail & Related papers (2020-12-07T07:53:09Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.