General Coded Computing in a Probabilistic Straggler Regime
- URL: http://arxiv.org/abs/2502.00645v1
- Date: Sun, 02 Feb 2025 03:24:05 GMT
- Title: General Coded Computing in a Probabilistic Straggler Regime
- Authors: Parsa Moradi, Mohammad Ali Maddah-Ali,
- Abstract summary: Recently, new coded computing schemes for general computing functions, where exact computation is replaced with approximate computation, have emerged.
This paper addresses the practically important scenario in the context of general coded computing, where each server may become a straggler with a probability $p$, independently from others.
We show that the average approximation error for two existing general coded computing schemes converges to zero with the rate of at least $mathcalO(log3_frac1p(N)cdotN-3)$.
- Score: 15.960546024967327
- License:
- Abstract: Coded computing has demonstrated promising results in addressing straggler resiliency in distributed computing systems. However, most coded computing schemes are designed for exact computation, requiring the number of responding servers to exceed a certain recovery threshold. Additionally, these schemes are tailored for highly structured functions. Recently, new coded computing schemes for general computing functions, where exact computation is replaced with approximate computation, have emerged. In these schemes, the availability of additional results corresponds to more accurate estimation of computational tasks. This flexibility introduces new questions that need to be addressed. This paper addresses the practically important scenario in the context of general coded computing, where each server may become a straggler with a probability $p$, independently from others. We theoretically analyze the approximation error of two existing general coded computing schemes: Berrut Approximate Coded Computing (BACC) and Learning Theoretic Coded Computing (LeTCC). Under the probabilistic straggler configuration, we demonstrate that the average approximation error for BACC and LeTCC converge to zero with the rate of at least $\mathcal{O}(\log^3_{\frac{1}{p}}(N)\cdot{N^{-3}})$ and $\mathcal{O}(\log^4_{\frac{1}{p}}(N)\cdot{N^{-2}})$, respectively. This is perhaps surprising, as earlier results does not indicate a convergence when the number of stragglers scales with the total number of servers $N$. However, in this case, despite the average number of stragglers being $Np$, the independence of servers in becoming stragglers allows the approximation error to converge to zero. These theoretical results are validated through experiments on various computing functions, including deep neural networks.
Related papers
- General Coded Computing: Adversarial Settings [15.839621757142597]
This paper lays the foundation for general coded computing, which extends the applicability of coded computing to handle a wide class of computations.
We show that within a general framework, the proposed scheme achieves optimal adversarial robustness, in terms of maximum number of adversarial servers it can tolerate.
arXiv Detail & Related papers (2025-02-12T01:50:12Z) - Coded Computing for Resilient Distributed Computing: A Learning-Theoretic Framework [15.703073293718953]
We propose a novel foundation for coded computing, integrating the principles of learning theory, and developing a framework that seamlessly adapts with machine learning applications.
We show that in the proposed solution, the mean squared error of the estimation decays with the rate of $mathcalO(S3 N-3)$ and $mathcalO(Sfrac85Nfrac-35)$ in noiseless and noisy computation settings.
arXiv Detail & Related papers (2024-06-01T05:01:25Z) - On Hardware-efficient Inference in Probabilistic Circuits [5.335146727090435]
This work proposes the first dedicated approximate computing framework for PCs.
We leverage Addition As Int, resulting in linear PC computation with simple hardware elements.
We provide a theoretical approximation error analysis and present an error compensation mechanism.
arXiv Detail & Related papers (2024-05-22T13:38:47Z) - Randomized Polar Codes for Anytime Distributed Machine Learning [66.46612460837147]
We present a novel distributed computing framework that is robust to slow compute nodes, and is capable of both approximate and exact computation of linear operations.
We propose a sequential decoding algorithm designed to handle real valued data while maintaining low computational complexity for recovery.
We demonstrate the potential applications of this framework in various contexts, such as large-scale matrix multiplication and black-box optimization.
arXiv Detail & Related papers (2023-09-01T18:02:04Z) - Computationally Efficient Approximations for Matrix-based Renyi's
Entropy [33.72108955447222]
Recently developed matrix based Renyi's entropy enables measurement of information in data.
computation of such quantity involves the trace operator on a PSD matrix $G$ to power $alpha$(i.e., $tr(Galpha)$.
We present computationally efficient approximations to this new entropy functional that can reduce its complexity to even significantly less than $O(n2)$.
arXiv Detail & Related papers (2021-12-27T14:59:52Z) - Acceleration in Distributed Optimization Under Similarity [72.54787082152278]
We study distributed (strongly convex) optimization problems over a network of agents, with no centralized nodes.
An $varepsilon$-solution is achieved in $tildemathcalrhoObig(sqrtfracbeta/mu (1-)log1/varepsilonbig)$ number of communications steps.
This rate matches (up to poly-log factors) for the first time lower complexity communication bounds of distributed gossip-algorithms applied to the class of problems of interest.
arXiv Detail & Related papers (2021-10-24T04:03:00Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Statistically Meaningful Approximation: a Case Study on Approximating
Turing Machines with Transformers [50.85524803885483]
This work proposes a formal definition of statistically meaningful (SM) approximation which requires the approximating network to exhibit good statistical learnability.
We study SM approximation for two function classes: circuits and Turing machines.
arXiv Detail & Related papers (2021-07-28T04:28:55Z) - On Function Approximation in Reinforcement Learning: Optimism in the
Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning.
In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function.
Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z) - Berrut Approximated Coded Computing: Straggler Resistance Beyond
Polynomial Computing [34.69732430310801]
We propose Berrut Approximated Coded Computing (BACC) as an alternative approach to deal with stragglers effect.
BACC is proven to be numerically stable with low computational complexity.
In particular, BACC is used to train a deep neural network on a cluster of servers.
arXiv Detail & Related papers (2020-09-17T14:23:38Z) - Coded Distributed Computing with Partial Recovery [56.08535873173518]
We introduce a novel coded matrix-vector multiplication scheme, called coded computation with partial recovery (CCPR)
CCPR reduces both the computation time and the decoding complexity by allowing a trade-off between the accuracy and the speed of computation.
We then extend this approach to distributed implementation of more general computation tasks by proposing a coded communication scheme with partial recovery.
arXiv Detail & Related papers (2020-07-04T21:34:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.