Related papers: Age-Based Coded Computation for Bias Reduction in Distributed Learning

Age-Based Coded Computation for Bias Reduction in Distributed Learning

URL: http://arxiv.org/abs/2006.01816v1
Date: Tue, 2 Jun 2020 17:51:11 GMT
Title: Age-Based Coded Computation for Bias Reduction in Distributed Learning
Authors: Emre Ozfatura and Baturalp Buyukates and Deniz Gunduz and Sennur Ulukus
Abstract summary: Coded computation can be used to speed up distributed learning in the presence of straggling workers. Partial recovery of the gradient vector can further reduce the computation time at each iteration. Estimator bias will be particularly prevalent when the straggling behavior is correlated over time.
Score: 57.9123881133818
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Coded computation can be used to speed up distributed learning in the presence of straggling workers. Partial recovery of the gradient vector can further reduce the computation time at each iteration; however, this can result in biased estimators, which may slow down convergence, or even cause divergence. Estimator bias will be particularly prevalent when the straggling behavior is correlated over time, which results in the gradient estimators being dominated by a few fast servers. To mitigate biased estimators, we design a $timely$ dynamic encoding framework for partial recovery that includes an ordering operator that changes the codewords and computation orders at workers over time. To regulate the recovery frequencies, we adopt an $age$ metric in the design of the dynamic encoding scheme. We show through numerical results that the proposed dynamic encoding strategy increases the timeliness of the recovered computations, which as a result, reduces the bias in model updates, and accelerates the convergence compared to the conventional static partial recovery schemes.

Related papers

Inference Acceleration of Autoregressive Normalizing Flows by Selective Jacobi Decoding [12.338918067455436]
Normalizing flows are promising generative models with advantages such as theoretical rigor, analytical log-likelihood, and end-to-end training.<n>Recent advancements utilize autoregressive modeling, significantly enhancing expressive power and generation quality.<n>We propose a selective Jacobi decoding (SeJD) strategy that accelerates autoregressive inference through parallel iterative optimization.
arXiv Detail & Related papers (2025-05-30T16:53:15Z)
Fast Training of Recurrent Neural Networks with Stationary State Feedbacks [48.22082789438538]
Recurrent neural networks (RNNs) have recently demonstrated strong performance and faster inference than Transformers. We propose a novel method that replaces BPTT with a fixed gradient feedback mechanism.
arXiv Detail & Related papers (2025-03-29T14:45:52Z)
A Mirror Descent-Based Algorithm for Corruption-Tolerant Distributed Gradient Descent [57.64826450787237]
We show how to analyze the behavior of distributed gradient descent algorithms in the presence of adversarial corruptions. We show how to use ideas from (lazy) mirror descent to design a corruption-tolerant distributed optimization algorithm. Experiments based on linear regression, support vector classification, and softmax classification on the MNIST dataset corroborate our theoretical findings.
arXiv Detail & Related papers (2024-07-19T08:29:12Z)
ReCycle: Fast and Efficient Long Time Series Forecasting with Residual Cyclic Transformers [0.06965384453064827]
Residual Cyclic Transformer, ReCycle, bridges the gap between high method complexity and realistic computational resources. Our approach reduces the run time and energy consumption by more than an order of magnitude, making both training and inference feasible on low-performance, low-power and edge computing devices.
arXiv Detail & Related papers (2024-05-06T12:48:34Z)
Randomized Polar Codes for Anytime Distributed Machine Learning [66.46612460837147]
We present a novel distributed computing framework that is robust to slow compute nodes, and is capable of both approximate and exact computation of linear operations. We propose a sequential decoding algorithm designed to handle real valued data while maintaining low computational complexity for recovery. We demonstrate the potential applications of this framework in various contexts, such as large-scale matrix multiplication and black-box optimization.
arXiv Detail & Related papers (2023-09-01T18:02:04Z)
Fast and Straggler-Tolerant Distributed SGD with Reduced Computation Load [11.069252535469644]
optimization procedures like gradient descent (SGD) can be leveraged to mitigate the effect of unresponsive or slow workers called stragglers. This can be done by only waiting for a subset of the workers to finish their computation at each iteration of the algorithm. We construct a novel scheme that adapts both the number of workers and the computation load throughout the run-time of the algorithm.
arXiv Detail & Related papers (2023-04-17T20:12:18Z)
Nested Gradient Codes for Straggler Mitigation in Distributed Machine Learning [21.319460501659666]
Gradient codes are designed to tolerate a fixed number of stragglers. We propose a gradient coding scheme that can tolerate a flexible number of stragglers. By proper task scheduling and small additional signaling, our scheme adapts the load of the workers to the actual number of stragglers.
arXiv Detail & Related papers (2022-12-16T16:56:51Z)
Loop Unrolled Shallow Equilibrium Regularizer (LUSER) -- A Memory-Efficient Inverse Problem Solver [26.87738024952936]
In inverse problems we aim to reconstruct some underlying signal of interest from potentially corrupted and often ill-posed measurements. We propose an LU algorithm with shallow equilibrium regularizers (L) These implicit models are as expressive as deeper convolutional networks, but far more memory efficient during training.
arXiv Detail & Related papers (2022-10-10T19:50:37Z)
Gradient Coding with Dynamic Clustering for Straggler Mitigation [57.9123881133818]
GC-DC regulates the number of straggling workers in each cluster based on the straggler behavior in the previous iteration. We numerically show that GC-DC provides significant improvements in the average completion time (of each iteration) with no increase in the communication load compared to the original GC scheme.
arXiv Detail & Related papers (2020-11-03T18:52:15Z)
Coded Distributed Computing with Partial Recovery [56.08535873173518]
We introduce a novel coded matrix-vector multiplication scheme, called coded computation with partial recovery (CCPR) CCPR reduces both the computation time and the decoding complexity by allowing a trade-off between the accuracy and the speed of computation. We then extend this approach to distributed implementation of more general computation tasks by proposing a coded communication scheme with partial recovery.
arXiv Detail & Related papers (2020-07-04T21:34:49Z)
Accelerated Convergence for Counterfactual Learning to Rank [65.63997193915257]
We show that convergence rate of SGD approaches with IPS-weighted gradients suffers from the large variance introduced by the IPS weights. We propose a novel learning algorithm, called CounterSample, that has provably better convergence than standard IPS-weighted gradient descent methods. We prove that CounterSample converges faster and complement our theoretical findings with empirical results.
arXiv Detail & Related papers (2020-05-21T12:53:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.