Communication-Efficient Federated Learning via Quantized Compressed
Sensing
- URL: http://arxiv.org/abs/2111.15071v1
- Date: Tue, 30 Nov 2021 02:13:54 GMT
- Title: Communication-Efficient Federated Learning via Quantized Compressed
Sensing
- Authors: Yongjeong Oh, Namyoon Lee, Yo-Seb Jeon, and H. Vincent Poor
- Abstract summary: The presented framework consists of gradient compression for wireless devices and gradient reconstruction for a parameter server.
Thanks to gradient sparsification and quantization, our strategy can achieve a higher compression ratio than one-bit gradient compression.
We demonstrate that the framework achieves almost identical performance with the case that performs no compression.
- Score: 82.10695943017907
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present a communication-efficient federated learning
framework inspired by quantized compressed sensing. The presented framework
consists of gradient compression for wireless devices and gradient
reconstruction for a parameter server (PS). Our strategy for gradient
compression is to sequentially perform block sparsification, dimensional
reduction, and quantization. Thanks to gradient sparsification and
quantization, our strategy can achieve a higher compression ratio than one-bit
gradient compression. For accurate aggregation of the local gradients from the
compressed signals at the PS, we put forth an approximate minimum mean square
error (MMSE) approach for gradient reconstruction using the
expectation-maximization generalized-approximate-message-passing (EM-GAMP)
algorithm. Assuming Bernoulli Gaussian-mixture prior, this algorithm
iteratively updates the posterior mean and variance of local gradients from the
compressed signals. We also present a low-complexity approach for the gradient
reconstruction. In this approach, we use the Bussgang theorem to aggregate
local gradients from the compressed signals, then compute an approximate MMSE
estimate of the aggregated gradient using the EM-GAMP algorithm. We also
provide a convergence rate analysis of the presented framework. Using the MNIST
dataset, we demonstrate that the presented framework achieves almost identical
performance with the case that performs no compression, while significantly
reducing communication overhead for federated learning.
Related papers
- Temporal Predictive Coding for Gradient Compression in Distributed Learning [11.704910933646115]
This paper proposes a prediction-based gradient compression method for distributed learning with event-triggered communication.
Our goal is to reduce the amount of information transmitted from the distributed agents to the parameter server by exploiting temporal correlation in the local gradients.
arXiv Detail & Related papers (2024-10-03T13:35:28Z) - Language Models as Zero-shot Lossless Gradient Compressors: Towards
General Neural Parameter Prior Models [66.1595537904019]
Large language models (LLMs) can act as gradient priors in a zero-shot setting.
We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.
arXiv Detail & Related papers (2024-09-26T13:38:33Z) - Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
We propose a novel algorithm for distributed gradient descent (SGD) with compressed gradient communication in the parameter-server framework.
Our gradient compression technique, named flattened one-bit gradient descent (FO-SGD), relies on two simple algorithmic ideas.
arXiv Detail & Related papers (2024-05-17T21:17:27Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - Federated Optimization Algorithms with Random Reshuffling and Gradient
Compression [2.7554288121906296]
We provide the first analysis of methods with gradient compression and without-replacement sampling.
We show how to reduce the variance coming from gradient quantization through the use of control iterates.
We outline several settings in which they improve upon existing algorithms.
arXiv Detail & Related papers (2022-06-14T17:36:47Z) - Communication-Efficient Distributed SGD with Compressed Sensing [24.33697801661053]
We consider large scale distributed optimization over a set of edge devices connected to a central server.
Inspired by recent advances in federated learning, we propose a distributed gradient descent (SGD) type algorithm that exploits the sparsity of the gradient, when possible, to reduce communication burden.
We conduct theoretical analysis on the convergence of our algorithm in the presence of noise perturbation incurred by the communication channels, and also conduct numerical experiments to corroborate its effectiveness.
arXiv Detail & Related papers (2021-12-15T02:10:45Z) - Wyner-Ziv Gradient Compression for Federated Learning [4.619828919345114]
Gradient compression is an effective method to reduce communication load by transmitting compressed gradients.
This paper proposes a practical gradient compression scheme for federated learning, which uses historical gradients to compress gradients.
We also implement our gradient quantization method on the real dataset, and the performance of our method is better than the previous schemes.
arXiv Detail & Related papers (2021-11-16T07:55:43Z) - An Efficient Statistical-based Gradient Compression Technique for
Distributed Training Systems [77.88178159830905]
Sparsity-Inducing Distribution-based Compression (SIDCo) is a threshold-based sparsification scheme that enjoys similar threshold estimation quality to deep gradient compression (DGC)
Our evaluation shows SIDCo speeds up training by up to 41:7%, 7:6%, and 1:9% compared to the no-compression baseline, Topk, and DGC compressors, respectively.
arXiv Detail & Related papers (2021-01-26T13:06:00Z) - Sparse Communication for Training Deep Networks [56.441077560085475]
Synchronous gradient descent (SGD) is the most common method used for distributed training of deep learning models.
In this algorithm, each worker shares its local gradients with others and updates the parameters using the average gradients of all workers.
We study several compression schemes and identify how three key parameters affect the performance.
arXiv Detail & Related papers (2020-09-19T17:28:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.