Lossy Gradient Compression: How Much Accuracy Can One Bit Buy?
- URL: http://arxiv.org/abs/2202.02812v1
- Date: Sun, 6 Feb 2022 16:29:00 GMT
- Title: Lossy Gradient Compression: How Much Accuracy Can One Bit Buy?
- Authors: Sadaf Salehkalaibar and Stefano Rini
- Abstract summary: We propose a class of distortion measures for the design of quantizer for the compression of the model updates.
In this paper, we take a rate-distortion approach to answer this question for the distributed training of a deep neural network (DNN)
- Score: 17.907068248604755
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In federated learning (FL), a global model is trained at a Parameter Server
(PS) by aggregating model updates obtained from multiple remote learners.
Critically, the communication between the remote users and the PS is limited by
the available power for transmission, while the transmission from the PS to the
remote users can be considered unbounded. This gives rise to the distributed
learning scenario in which the updates from the remote learners have to be
compressed so as to meet communication rate constraints in the uplink
transmission toward the PS. For this problem, one would like to compress the
model updates so as to minimize the resulting loss in accuracy. In this paper,
we take a rate-distortion approach to answer this question for the distributed
training of a deep neural network (DNN). In particular, we define a measure of
the compression performance, the \emph{per-bit accuracy}, which addresses the
ultimate model accuracy that a bit of communication brings to the centralized
model. In order to maximize the per-bit accuracy, we consider modeling the
gradient updates at remote learners as a generalized normal distribution. Under
this assumption on the model update distribution, we propose a class of
distortion measures for the design of quantizer for the compression of the
model updates. We argue that this family of distortion measures, which we refer
to as "$M$-magnitude weighted $L_2$" norm, capture the practitioner intuition
in the choice of gradient compressor. Numerical simulations are provided to
validate the proposed approach.
Related papers
- Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression [10.233937665979694]
DLRM is a state-of-the-art recommendation system model that has gained widespread adoption across various industry applications.
A significant bottleneck in this process is the time-consuming all-to-all communication required to collect embedding data from all devices.
We introduce a method that employs error-bounded lossy compression to reduce the communication data size and accelerate DLRM training.
arXiv Detail & Related papers (2024-07-05T05:55:18Z) - Fed-CVLC: Compressing Federated Learning Communications with
Variable-Length Codes [54.18186259484828]
In Federated Learning (FL) paradigm, a parameter server (PS) concurrently communicates with distributed participating clients for model collection, update aggregation, and model distribution over multiple rounds.
We show strong evidences that variable-length is beneficial for compression in FL.
We present Fed-CVLC (Federated Learning Compression with Variable-Length Codes), which fine-tunes the code length in response to the dynamics of model updates.
arXiv Detail & Related papers (2024-02-06T07:25:21Z) - Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized
Language Model Finetuning Using Shared Randomness [86.61582747039053]
Language model training in distributed settings is limited by the communication cost of exchanges.
We extend recent work using shared randomness to perform distributed fine-tuning with low bandwidth.
arXiv Detail & Related papers (2023-06-16T17:59:51Z) - M22: A Communication-Efficient Algorithm for Federated Learning Inspired
by Rate-Distortion [19.862336286338564]
In federated learning, model updates must be compressed so as to minimize the loss in accuracy resulting from a communication constraint.
This paper proposes emph$bf M$-magnitude weighted $L_bf 2$ distortion + $bf 2$ degrees of freedom'' (M22) algorithm, a rate-distortion inspired approach to gradient compression.
arXiv Detail & Related papers (2023-01-23T04:40:01Z) - Optimizing the Communication-Accuracy Trade-off in Federated Learning
with Rate-Distortion Theory [1.5771347525430772]
A significant bottleneck in federated learning is the network communication cost of sending model updates from client devices to the central server.
Our method encodes quantized updates with an appropriate universal code, taking into account their empirical distribution.
Because quantization introduces error, we select quantization levels by optimizing for the desired trade-off in average total gradient and distortion.
arXiv Detail & Related papers (2022-01-07T20:17:33Z) - DNN gradient lossless compression: Can GenNorm be the answer? [17.37160669785566]
gradient compression is relevant in many distributed Deep Neural Network (DNN) training scenarios.
For some networks of practical interest, the gradient entries can be well modelled as having a generalized normal (GenNorm) distribution.
arXiv Detail & Related papers (2021-11-15T08:33:10Z) - LCS: Learning Compressible Subspaces for Adaptive Network Compression at
Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models.
We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity.
Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z) - Slashing Communication Traffic in Federated Learning by Transmitting
Clustered Model Updates [12.660500431713336]
Federated Learning (FL) is an emerging decentralized learning framework through which multiple clients can collaboratively train a learning model.
heavy communication traffic can be incurred by exchanging model updates via the Internet between clients and the parameter server.
In this work, we devise the Model Update Compression by Soft Clustering (MUCSC) algorithm to compress model updates transmitted between clients and the PS.
arXiv Detail & Related papers (2021-05-10T07:15:49Z) - Over-the-Air Federated Learning from Heterogeneous Data [107.05618009955094]
Federated learning (FL) is a framework for distributed learning of centralized models.
We develop a Convergent OTA FL (COTAF) algorithm which enhances the common local gradient descent (SGD) FL algorithm.
We numerically show that the precoding induced by COTAF notably improves the convergence rate and the accuracy of models trained via OTA FL.
arXiv Detail & Related papers (2020-09-27T08:28:25Z) - UVeQFed: Universal Vector Quantization for Federated Learning [179.06583469293386]
Federated learning (FL) is an emerging approach to train such learning models without requiring the users to share their possibly private labeled data.
In FL, each user trains its copy of the learning model locally. The server then collects the individual updates and aggregates them into a global model.
We show that combining universal vector quantization methods with FL yields a decentralized training system in which the compression of the trained models induces only a minimum distortion.
arXiv Detail & Related papers (2020-06-05T07:10:22Z) - Training with Quantization Noise for Extreme Model Compression [57.51832088938618]
We tackle the problem of producing compact models, maximizing their accuracy for a given model size.
A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator.
In this paper, we extend this approach to work beyond int8 fixed-point quantization with extreme compression methods.
arXiv Detail & Related papers (2020-04-15T20:10:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.