How to Attain Communication-Efficient DNN Training? Convert, Compress,
Correct
- URL: http://arxiv.org/abs/2204.08211v2
- Date: Thu, 1 Jun 2023 12:28:05 GMT
- Title: How to Attain Communication-Efficient DNN Training? Convert, Compress,
Correct
- Authors: Zhong-Jing Chen, Eduin E. Hernandez, Yu-Chih Huang and Stefano Rini
- Abstract summary: This paper introduces CO3 -- an algorithm for communication-efficient federated Deep Neural Network (DNN) training.
CO3 takes its name from three processing applied which reduce the communication load when transmitting the local DNN gradients from the remote users to the Server.
- Score: 19.440030100380632
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces CO3 -- an algorithm for communication-efficient
federated Deep Neural Network (DNN) training. CO3 takes its name from three
processing applied which reduce the communication load when transmitting the
local DNN gradients from the remote users to the Parameter Server. Namely: (i)
gradient quantization through floating-point conversion, (ii) lossless
compression of the quantized gradient, and (iii) quantization error correction.
We carefully design each of the steps above to assure good training performance
under a constraint on the communication rate. In particular, in steps (i) and
(ii), we adopt the assumption that DNN gradients are distributed according to a
generalized normal distribution, which is validated numerically in the paper.
For step (iii), we utilize an error feedback with memory decay mechanism to
correct the quantization error introduced in step (i). We argue that the memory
decay coefficient, similarly to the learning rate, can be optimally tuned to
improve convergence. A rigorous convergence analysis of the proposed CO3 with
SGD is provided. Moreover, with extensive simulations, we show that CO3 offers
improved performance when compared with existing gradient compression schemes
in the literature which employ sketching and non-uniform quantization of the
local gradients.
Related papers
- Rate-Constrained Quantization for Communication-Efficient Federated Learning [5.632231145349047]
We develop a novel quantized FL framework, called textbfrate-textbfconstrained textbffederated learning (RC-FED)
We formulate this scheme, as a joint optimization in which the quantization distortion is minimized while the rate of encoded gradients is kept below a target threshold.
We analyze the convergence behavior of RC-FED, and show its superior performance against baseline quantized FL schemes on several datasets.
arXiv Detail & Related papers (2024-09-10T08:22:01Z) - Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
We propose a novel algorithm for distributed gradient descent (SGD) with compressed gradient communication in the parameter-server framework.
Our gradient compression technique, named flattened one-bit gradient descent (FO-SGD), relies on two simple algorithmic ideas.
arXiv Detail & Related papers (2024-05-17T21:17:27Z) - Adaptive Top-K in SGD for Communication-Efficient Distributed Learning [14.867068493072885]
This paper proposes a novel adaptive Top-K in SGD framework that enables an adaptive degree of sparsification for each gradient descent step to optimize the convergence performance.
numerical results on the MNIST and CIFAR-10 datasets demonstrate that the proposed adaptive Top-K algorithm in SGD achieves a significantly better convergence rate compared to state-of-the-art methods.
arXiv Detail & Related papers (2022-10-24T18:33:35Z) - Green, Quantized Federated Learning over Wireless Networks: An
Energy-Efficient Design [68.86220939532373]
The finite precision level is captured through the use of quantized neural networks (QNNs) that quantize weights and activations in fixed-precision format.
The proposed FL framework can reduce energy consumption until convergence by up to 70% compared to a baseline FL algorithm.
arXiv Detail & Related papers (2022-07-19T16:37:24Z) - Convert, compress, correct: Three steps toward communication-efficient
DNN training [19.440030100380632]
We introduce a novel algorithm, $mathsfCO_3$, for communication-efficiency distributed Deep Neural Network (DNN) training.
$mathsfCO_3$ is a joint training/communication protocol, which encompasses three processing steps for the network gradients.
The interplay of these three steps in processing the gradients is carefully balanced to yield a robust and high-performance scheme.
arXiv Detail & Related papers (2022-03-17T02:47:13Z) - Communication-Efficient Federated Learning via Quantized Compressed
Sensing [82.10695943017907]
The presented framework consists of gradient compression for wireless devices and gradient reconstruction for a parameter server.
Thanks to gradient sparsification and quantization, our strategy can achieve a higher compression ratio than one-bit gradient compression.
We demonstrate that the framework achieves almost identical performance with the case that performs no compression.
arXiv Detail & Related papers (2021-11-30T02:13:54Z) - DQ-SGD: Dynamic Quantization in SGD for Communication-Efficient
Distributed Learning [22.83609192604322]
We propose a novel dynamically quantized SGD (DQ-SGD) framework to dynamically adjust the quantization scheme for each gradient descent step.
We show that our quantization scheme achieves better tradeoffs between the communication cost and learning performance than other state-of-the-art gradient quantization methods.
arXiv Detail & Related papers (2021-07-30T12:22:31Z) - Cogradient Descent for Dependable Learning [64.02052988844301]
We propose a dependable learning based on Cogradient Descent (CoGD) algorithm to address the bilinear optimization problem.
CoGD is introduced to solve bilinear problems when one variable is with sparsity constraint.
It can also be used to decompose the association of features and weights, which further generalizes our method to better train convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-06-20T04:28:20Z) - Adaptive Quantization of Model Updates for Communication-Efficient
Federated Learning [75.45968495410047]
Communication of model updates between client nodes and the central aggregating server is a major bottleneck in federated learning.
Gradient quantization is an effective way of reducing the number of bits required to communicate each model update.
We propose an adaptive quantization strategy called AdaFL that aims to achieve communication efficiency as well as a low error floor.
arXiv Detail & Related papers (2021-02-08T19:14:21Z) - Optimal Gradient Quantization Condition for Communication-Efficient
Distributed Training [99.42912552638168]
Communication of gradients is costly for training deep neural networks with multiple devices in computer vision applications.
In this work, we deduce the optimal condition of both the binary and multi-level gradient quantization for textbfANY gradient distribution.
Based on the optimal condition, we develop two novel quantization schemes: biased BinGrad and unbiased ORQ for binary and multi-level gradient quantization respectively.
arXiv Detail & Related papers (2020-02-25T18:28:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.