Related papers: Random Alloy Codes and the Fundamental Limits of Coded Distributed Tensors

Random Alloy Codes and the Fundamental Limits of Coded Distributed Tensors

URL: http://arxiv.org/abs/2202.03469v7
Date: Mon, 28 Oct 2024 20:42:35 GMT
Title: Random Alloy Codes and the Fundamental Limits of Coded Distributed Tensors
Authors: Pedro Soto,
Abstract summary: Stragglers and other failures can severely impact the overall completion time. Recent works in coded computing provide a novel strategy to mitigate stragglers with coded tasks. We show that this strict definition does not directly optimize the probability of failure.
Score: 1.8130068086063333
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Tensors are a fundamental operation in distributed computing, \emph{e.g.,} machine learning, that are commonly distributed into multiple parallel tasks for large datasets. Stragglers and other failures can severely impact the overall completion time. Recent works in coded computing provide a novel strategy to mitigate stragglers with coded tasks, with an objective of minimizing the number of tasks needed to recover the overall result, known as the recovery threshold. However, we demonstrate that this strict combinatorial definition does not directly optimize the probability of failure. In this paper, we focus on the most likely event and measure the optimality of a coding scheme more directly by its probability of decoding. Our probabilistic approach leads us to a practical construction of random codes for matrix multiplication, i.e., locally random alloy codes, which are optimal with respect to the measures. Furthermore, the probabilistic approach allows us to discover a surprising impossibility theorem about both random and deterministic coded distributed tensors.

Related papers

MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on Large Language Models [53.36415620647177]
Semi-structured sparsity offers a promising solution by strategically retaining $N$ elements out of every $M$ weights.<n>Existing (N:M)-compatible approaches typically fall into two categories: rule-based layerwise greedy search, which suffers from considerable errors, and gradient-driven learning, which incurs prohibitive training costs.<n>We propose a novel linear-space probabilistic framework named MaskPro, which aims to learn a prior categorical distribution for every $M$ consecutive weights and subsequently leverages this distribution to generate the (N:M)-sparsity throughout an $N$-way sampling
arXiv Detail & Related papers (2025-06-15T15:02:59Z)
Asymptotically Optimal Linear Best Feasible Arm Identification with Fixed Budget [55.938644481736446]
We introduce a novel algorithm for best feasible arm identification that guarantees an exponential decay in the error probability.<n>We validate our algorithm through comprehensive empirical evaluations across various problem instances with different levels of complexity.
arXiv Detail & Related papers (2025-06-03T02:56:26Z)
Foundations of Top-$k$ Decoding For Language Models [19.73575905188064]
We develop a theoretical framework that both explains and generalizes top-$k$ decoding.<n>We show how to optimize it efficiently for a large class of divergences.
arXiv Detail & Related papers (2025-05-25T23:46:34Z)
On Next-Token Prediction in LLMs: How End Goals Determine the Consistency of Decoding Algorithms [17.98959620987217]
Next-token prediction trained using cross-entropy loss is the basis of most large language models.<n>This paper examines a few of these algorithms and studies their consistency with respect to various goals encoded as loss functions.
arXiv Detail & Related papers (2025-05-16T12:38:45Z)
Adaptive Sampled Softmax with Inverted Multi-Index: Methods, Theory and Applications [79.53938312089308]
The MIDX-Sampler is a novel adaptive sampling strategy based on an inverted multi-index approach. Our method is backed by rigorous theoretical analysis, addressing key concerns such as sampling bias, gradient bias, convergence rates, and generalization error bounds.
arXiv Detail & Related papers (2025-01-15T04:09:21Z)
OD-Stega: LLM-Based Near-Imperceptible Steganography via Optimized Distributions [7.611860976107124]
We consider coverless steganography where a Large Language Model drives an arithmetic coding decoder to generate stego-texts. An efficient method should embed secret message bits in as few language tokens as possible, while still keeping the stego-text natural and fluent.
arXiv Detail & Related papers (2024-10-06T01:30:45Z)
Derandomizing Multi-Distribution Learning [28.514129340758938]
Multi-distribution learning involves learning a single predictor that works well across multiple data distributions. Recent research has shown near-optimal sample complexity achieved with oracle efficient algorithms. This raises the question: can these algorithms be derandomized to produce a deterministic predictor for multiple distributions?
arXiv Detail & Related papers (2024-09-26T06:28:56Z)
Learning Regularities from Data using Spiking Functions: A Theory [1.3735277588793995]
We propose a new machine learning theory, which defines in mathematics what are regularities. We say that the discovered non-randomness is encoded into regularities if the function is simple enough. In this process, we claim that the 'best' regularities, or the optimal spiking functions, are those who can capture the largest amount of information.
arXiv Detail & Related papers (2024-05-19T22:04:11Z)
Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples. Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance. We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z)
Distributionally Robust Skeleton Learning of Discrete Bayesian Networks [9.46389554092506]
We consider the problem of learning the exact skeleton of general discrete Bayesian networks from potentially corrupted data. We propose to optimize the most adverse risk over a family of distributions within bounded Wasserstein distance or KL divergence to the empirical distribution. We present efficient algorithms and show the proposed methods are closely related to the standard regularized regression approach.
arXiv Detail & Related papers (2023-11-10T15:33:19Z)
Randomized Polar Codes for Anytime Distributed Machine Learning [66.46612460837147]
We present a novel distributed computing framework that is robust to slow compute nodes, and is capable of both approximate and exact computation of linear operations. We propose a sequential decoding algorithm designed to handle real valued data while maintaining low computational complexity for recovery. We demonstrate the potential applications of this framework in various contexts, such as large-scale matrix multiplication and black-box optimization.
arXiv Detail & Related papers (2023-09-01T18:02:04Z)
qecGPT: decoding Quantum Error-correcting Codes with Generative Pre-trained Transformers [5.392298820599664]
We propose a framework for decoding quantum error-correcting codes with generative modeling. We use autoregressive neural networks, specifically Transformers, to learn the joint probability of logical operators and syndromes. Our framework is general and can be applied to any error model and quantum codes with different topologies.
arXiv Detail & Related papers (2023-07-18T07:34:02Z)
Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which. machines can independently compute gradients, and cooperate. Our algorithm is based on a new concentration technique, and its sample complexity. It is very practical: it improves upon the performance of all prior methods when no. setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z)
Efficient semidefinite-programming-based inference for binary and multi-class MRFs [83.09715052229782]
We propose an efficient method for computing the partition function or MAP estimate in a pairwise MRF. We extend semidefinite relaxations from the typical binary MRF to the full multi-class setting, and develop a compact semidefinite relaxation that can again be solved efficiently using the solver.
arXiv Detail & Related papers (2020-12-04T15:36:29Z)
Amortized Conditional Normalized Maximum Likelihood: Reliable Out of Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation. Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle. We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z)
Generative Semantic Hashing Enhanced via Boltzmann Machines [61.688380278649056]
Existing generative-hashing methods mostly assume a factorized form for the posterior distribution. We propose to employ the distribution of Boltzmann machine as the retrievalal posterior. We show that by effectively modeling correlations among different bits within a hash code, our model can achieve significant performance gains.
arXiv Detail & Related papers (2020-06-16T01:23:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.