Related papers: Lightweight Projective Derivative Codes for Compressed Asynchronous Gradient Descent

Lightweight Projective Derivative Codes for Compressed Asynchronous Gradient Descent

URL: http://arxiv.org/abs/2201.12990v1
Date: Mon, 31 Jan 2022 04:08:53 GMT
Title: Lightweight Projective Derivative Codes for Compressed Asynchronous Gradient Descent
Authors: Pedro Soto, Ilia Ilmer, Haibin Guan, Jun Li
Abstract summary: This paper proposes a novel algorithm that encodes the partial derivatives themselves and furthermore optimize the codes by performing lossy compression on the derivative codewords. The utility of this application of coding theory is a geometrical consequence of the observed fact in optimization research that noise is tolerable, sometimes even helpful, in gradient descent based learning algorithms.
Score: 6.055286666916789
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Coded distributed computation has become common practice for performing gradient descent on large datasets to mitigate stragglers and other faults. This paper proposes a novel algorithm that encodes the partial derivatives themselves and furthermore optimizes the codes by performing lossy compression on the derivative codewords by maximizing the information contained in the codewords while minimizing the information between the codewords. The utility of this application of coding theory is a geometrical consequence of the observed fact in optimization research that noise is tolerable, sometimes even helpful, in gradient descent based learning algorithms since it helps avoid overfitting and local minima. This stands in contrast with much current conventional work on distributed coded computation which focuses on recovering all of the data from the workers. A second further contribution is that the low-weight nature of the coding scheme allows for asynchronous gradient updates since the code can be iteratively decoded; i.e., a worker's task can immediately be updated into the larger gradient. The directional derivative is always a linear function of the direction vectors; thus, our framework is robust since it can apply linear coding techniques to general machine learning frameworks such as deep neural networks.

Related papers

Fast correlated decoding of transversal logical algorithms [67.01652927671279]
Quantum error correction (QEC) is required for large-scale computation, but incurs a significant resource overhead.<n>Recent advances have shown that by jointly decoding logical qubits in algorithms composed of logical gates, the number of syndrome extraction rounds can be reduced.<n>Here, we reform the problem of decoding circuits by directly decoding relevant logical operator products as they propagate through the circuit.
arXiv Detail & Related papers (2025-05-19T18:00:00Z)
A Mirror Descent-Based Algorithm for Corruption-Tolerant Distributed Gradient Descent [57.64826450787237]
We show how to analyze the behavior of distributed gradient descent algorithms in the presence of adversarial corruptions. We show how to use ideas from (lazy) mirror descent to design a corruption-tolerant distributed optimization algorithm. Experiments based on linear regression, support vector classification, and softmax classification on the MNIST dataset corroborate our theoretical findings.
arXiv Detail & Related papers (2024-07-19T08:29:12Z)
Factor Graph Optimization of Error-Correcting Codes for Belief Propagation Decoding [62.25533750469467]
Low-Density Parity-Check (LDPC) codes possess several advantages over other families of codes. The proposed approach is shown to outperform the decoding performance of existing popular codes by orders of magnitude.
arXiv Detail & Related papers (2024-06-09T12:08:56Z)
Graph Neural Networks for Enhanced Decoding of Quantum LDPC Codes [6.175503577352742]
We propose a differentiable iterative decoder for quantum low-density parity-check (LDPC) codes. The proposed algorithm is composed of classical belief propagation (BP) decoding stages and intermediate graph neural network (GNN) layers.
arXiv Detail & Related papers (2023-10-26T19:56:25Z)
Fundamental Limits of Two-layer Autoencoders, and Achieving Them with Gradient Methods [91.54785981649228]
This paper focuses on non-linear two-layer autoencoders trained in the challenging proportional regime. Our results characterize the minimizers of the population risk, and show that such minimizers are achieved by gradient methods. For the special case of a sign activation function, our analysis establishes the fundamental limits for the lossy compression of Gaussian sources via (shallow) autoencoders.
arXiv Detail & Related papers (2022-12-27T12:37:34Z)
Variational Sparse Coding with Learned Thresholding [6.737133300781134]
We propose a new approach to variational sparse coding that allows us to learn sparse distributions by thresholding samples. We first evaluate and analyze our method by training a linear generator, showing that it has superior performance, statistical efficiency, and gradient estimation.
arXiv Detail & Related papers (2022-05-07T14:49:50Z)
MetaSDF: Meta-learning Signed Distance Functions [85.81290552559817]
Generalizing across shapes with neural implicit representations amounts to learning priors over the respective function space. We formalize learning of a shape space as a meta-learning problem and leverage gradient-based meta-learning algorithms to solve this task.
arXiv Detail & Related papers (2020-06-17T05:14:53Z)
Approximate Gradient Coding with Optimal Decoding [22.069731881164895]
In distributed optimization problems, a technique called gradient coding has been used to mitigate the effect of straggling machines. Recent work has studied approximate gradient coding, which concerns coding schemes where the replication factor of the data is too low to recover the full gradient exactly. We introduce novel approximate gradient codes based on expander graphs, in which each machine receives exactly two blocks of data points.
arXiv Detail & Related papers (2020-06-17T03:46:27Z)
Predictive Coding Approximates Backprop along Arbitrary Computation Graphs [68.8204255655161]
We develop a strategy to translate core machine learning architectures into their predictive coding equivalents. Our models perform equivalently to backprop on challenging machine learning benchmarks. Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry.
arXiv Detail & Related papers (2020-06-07T15:35:47Z)
Communication-Efficient Gradient Coding for Straggler Mitigation in Distributed Learning [17.454251607446555]
Distributed implementations of gradient-based methods, wherein a server distributes gradient computations across worker machines, need to overcome two limitations. Ye and Abbe [ICML 2018] proposed a coding-theoretic paradigm to characterize a fundamental trade-off between computation load per worker, communication overhead per worker, and straggler tolerance. We develop a communication-efficient gradient coding framework to overcome these drawbacks.
arXiv Detail & Related papers (2020-05-14T17:57:13Z)
Auto-Encoding Twin-Bottleneck Hashing [141.5378966676885]
This paper proposes an efficient and adaptive code-driven graph. It is updated by decoding in the context of an auto-encoder. Experiments on benchmarked datasets clearly show the superiority of our framework over the state-of-the-art hashing methods.
arXiv Detail & Related papers (2020-02-27T05:58:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.