Approximate Gradient Coding with Optimal Decoding
- URL: http://arxiv.org/abs/2006.09638v4
- Date: Fri, 6 Aug 2021 05:30:47 GMT
- Title: Approximate Gradient Coding with Optimal Decoding
- Authors: Margalit Glasgow, Mary Wootters
- Abstract summary: In distributed optimization problems, a technique called gradient coding has been used to mitigate the effect of straggling machines.
Recent work has studied approximate gradient coding, which concerns coding schemes where the replication factor of the data is too low to recover the full gradient exactly.
We introduce novel approximate gradient codes based on expander graphs, in which each machine receives exactly two blocks of data points.
- Score: 22.069731881164895
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In distributed optimization problems, a technique called gradient coding,
which involves replicating data points, has been used to mitigate the effect of
straggling machines. Recent work has studied approximate gradient coding, which
concerns coding schemes where the replication factor of the data is too low to
recover the full gradient exactly. Our work is motivated by the challenge of
creating approximate gradient coding schemes that simultaneously work well in
both the adversarial and stochastic models. To that end, we introduce novel
approximate gradient codes based on expander graphs, in which each machine
receives exactly two blocks of data points. We analyze the decoding error both
in the random and adversarial straggler setting, when optimal decoding
coefficients are used. We show that in the random setting, our schemes achieve
an error to the gradient that decays exponentially in the replication factor.
In the adversarial setting, the error is nearly a factor of two smaller than
any existing code with similar performance in the random setting. We show
convergence bounds both in the random and adversarial setting for gradient
descent under standard assumptions using our codes. In the random setting, our
convergence rate improves upon block-box bounds. In the adversarial setting, we
show that gradient descent can converge down to a noise floor that scales
linearly with the adversarial error to the gradient. We demonstrate empirically
that our schemes achieve near-optimal error in the random setting and converge
faster than algorithms which do not use the optimal decoding coefficients.
Related papers
- A Mirror Descent-Based Algorithm for Corruption-Tolerant Distributed Gradient Descent [57.64826450787237]
We show how to analyze the behavior of distributed gradient descent algorithms in the presence of adversarial corruptions.
We show how to use ideas from (lazy) mirror descent to design a corruption-tolerant distributed optimization algorithm.
Experiments based on linear regression, support vector classification, and softmax classification on the MNIST dataset corroborate our theoretical findings.
arXiv Detail & Related papers (2024-07-19T08:29:12Z) - Compression of Structured Data with Autoencoders: Provable Benefit of
Nonlinearities and Depth [83.15263499262824]
We prove that gradient descent converges to a solution that completely disregards the sparse structure of the input.
We show how to improve upon Gaussian performance for the compression of sparse data by adding a denoising function to a shallow architecture.
We validate our findings on image datasets, such as CIFAR-10 and MNIST.
arXiv Detail & Related papers (2024-02-07T16:32:29Z) - Over-the-Air Computation Aided Federated Learning with the Aggregation
of Normalized Gradient [12.692064367193934]
Over-the-air computation is a communication-efficient solution for federated learning (FL)
In such a system, iterative procedure of private loss function is updated, and then transmitted by every mobile device.
To circumvent this problem, we propose to turn local gradient to be normalized one before amplifying it.
arXiv Detail & Related papers (2023-08-17T16:15:47Z) - Fundamental Limits of Two-layer Autoencoders, and Achieving Them with
Gradient Methods [91.54785981649228]
This paper focuses on non-linear two-layer autoencoders trained in the challenging proportional regime.
Our results characterize the minimizers of the population risk, and show that such minimizers are achieved by gradient methods.
For the special case of a sign activation function, our analysis establishes the fundamental limits for the lossy compression of Gaussian sources via (shallow) autoencoders.
arXiv Detail & Related papers (2022-12-27T12:37:34Z) - Nested Gradient Codes for Straggler Mitigation in Distributed Machine
Learning [21.319460501659666]
Gradient codes are designed to tolerate a fixed number of stragglers.
We propose a gradient coding scheme that can tolerate a flexible number of stragglers.
By proper task scheduling and small additional signaling, our scheme adapts the load of the workers to the actual number of stragglers.
arXiv Detail & Related papers (2022-12-16T16:56:51Z) - Lightweight Projective Derivative Codes for Compressed Asynchronous
Gradient Descent [6.055286666916789]
This paper proposes a novel algorithm that encodes the partial derivatives themselves and furthermore optimize the codes by performing lossy compression on the derivative codewords.
The utility of this application of coding theory is a geometrical consequence of the observed fact in optimization research that noise is tolerable, sometimes even helpful, in gradient descent based learning algorithms.
arXiv Detail & Related papers (2022-01-31T04:08:53Z) - Stochastic Compositional Gradient Descent under Compositional
constraints [13.170519806372075]
We study constrained optimization problems where the objective and constraint functions are convex and expressed as compositions of functions.
The problem arises in fair classification/regression and in the design of queuing systems.
We prove that the proposed algorithm is guaranteed to find the optimal and feasible solution almost surely.
arXiv Detail & Related papers (2020-12-17T05:38:37Z) - Gaussian MRF Covariance Modeling for Efficient Black-Box Adversarial
Attacks [86.88061841975482]
We study the problem of generating adversarial examples in a black-box setting, where we only have access to a zeroth order oracle.
We use this setting to find fast one-step adversarial attacks, akin to a black-box version of the Fast Gradient Sign Method(FGSM)
We show that the method uses fewer queries and achieves higher attack success rates than the current state of the art.
arXiv Detail & Related papers (2020-10-08T18:36:51Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - Communication-Efficient Gradient Coding for Straggler Mitigation in
Distributed Learning [17.454251607446555]
Distributed implementations of gradient-based methods, wherein a server distributes gradient computations across worker machines, need to overcome two limitations.
Ye and Abbe [ICML 2018] proposed a coding-theoretic paradigm to characterize a fundamental trade-off between computation load per worker, communication overhead per worker, and straggler tolerance.
We develop a communication-efficient gradient coding framework to overcome these drawbacks.
arXiv Detail & Related papers (2020-05-14T17:57:13Z) - Variance Reduction with Sparse Gradients [82.41780420431205]
Variance reduction methods such as SVRG and SpiderBoost use a mixture of large and small batch gradients.
We introduce a new sparsity operator: The random-top-k operator.
Our algorithm consistently outperforms SpiderBoost on various tasks including image classification, natural language processing, and sparse matrix factorization.
arXiv Detail & Related papers (2020-01-27T08:23:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.