DQ-SGD: Dynamic Quantization in SGD for Communication-Efficient
Distributed Learning
- URL: http://arxiv.org/abs/2107.14575v1
- Date: Fri, 30 Jul 2021 12:22:31 GMT
- Title: DQ-SGD: Dynamic Quantization in SGD for Communication-Efficient
Distributed Learning
- Authors: Guangfeng Yan, Shao-Lun Huang, Tian Lan and Linqi Song
- Abstract summary: We propose a novel dynamically quantized SGD (DQ-SGD) framework to dynamically adjust the quantization scheme for each gradient descent step.
We show that our quantization scheme achieves better tradeoffs between the communication cost and learning performance than other state-of-the-art gradient quantization methods.
- Score: 22.83609192604322
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gradient quantization is an emerging technique in reducing communication
costs in distributed learning. Existing gradient quantization algorithms often
rely on engineering heuristics or empirical observations, lacking a systematic
approach to dynamically quantize gradients. This paper addresses this issue by
proposing a novel dynamically quantized SGD (DQ-SGD) framework, enabling us to
dynamically adjust the quantization scheme for each gradient descent step by
exploring the trade-off between communication cost and convergence error. We
derive an upper bound, tight in some cases, of the convergence error for a
restricted family of quantization schemes and loss functions. We design our
DQ-SGD algorithm via minimizing the communication cost under the convergence
error constraints. Finally, through extensive experiments on large-scale
natural language processing and computer vision tasks on AG-News, CIFAR-10, and
CIFAR-100 datasets, we demonstrate that our quantization scheme achieves better
tradeoffs between the communication cost and learning performance than other
state-of-the-art gradient quantization methods.
Related papers
- QT-DoG: Quantization-aware Training for Domain Generalization [58.439816306817306]
We propose Quantization-aware Training for Domain Generalization (QT-DoG)
QT-DoG exploits quantization as an implicit regularizer by inducing noise in model weights.
We demonstrate that QT-DoG generalizes across various datasets, architectures, and quantization algorithms.
arXiv Detail & Related papers (2024-10-08T13:21:48Z) - Rate-Constrained Quantization for Communication-Efficient Federated Learning [5.632231145349047]
We develop a novel quantized FL framework, called textbfrate-textbfconstrained textbffederated learning (RC-FED)
We formulate this scheme, as a joint optimization in which the quantization distortion is minimized while the rate of encoded gradients is kept below a target threshold.
We analyze the convergence behavior of RC-FED, and show its superior performance against baseline quantized FL schemes on several datasets.
arXiv Detail & Related papers (2024-09-10T08:22:01Z) - Truncated Non-Uniform Quantization for Distributed SGD [17.30572818507568]
We introduce a novel two-stage quantization strategy to enhance the communication efficiency of distributed gradient Descent (SGD)
The proposed method initially employs truncation to mitigate the impact of long-tail noise, followed by a non-uniform quantization of the post-truncation gradients based on their statistical characteristics.
Our proposed algorithm outperforms existing quantization schemes, striking a superior balance between communication efficiency and convergence performance.
arXiv Detail & Related papers (2024-02-02T05:59:48Z) - Neural Networks with Quantization Constraints [111.42313650830248]
We present a constrained learning approach to quantization training.
We show that the resulting problem is strongly dual and does away with gradient estimations.
We demonstrate that the proposed approach exhibits competitive performance in image classification tasks.
arXiv Detail & Related papers (2022-10-27T17:12:48Z) - Quantized Adaptive Subgradient Algorithms and Their Applications [39.103587572626026]
We propose quantized composite mirror descent adaptive subgradient (QCMD adagrad) and quantized regularized dual average adaptive subgradient (QRDA adagrad) for distributed training.
A quantized gradient-based adaptive learning rate matrix is constructed to achieve a balance between communication costs, accuracy, and model sparsity.
arXiv Detail & Related papers (2022-08-11T04:04:03Z) - Fundamental Limits of Communication Efficiency for Model Aggregation in
Distributed Learning: A Rate-Distortion Approach [54.311495894129585]
We study the limit of communication cost of model aggregation in distributed learning from a rate-distortion perspective.
It is found that the communication gain by exploiting the correlation between worker nodes is significant for SignSGD.
arXiv Detail & Related papers (2022-06-28T13:10:40Z) - Adaptive Quantization of Model Updates for Communication-Efficient
Federated Learning [75.45968495410047]
Communication of model updates between client nodes and the central aggregating server is a major bottleneck in federated learning.
Gradient quantization is an effective way of reducing the number of bits required to communicate each model update.
We propose an adaptive quantization strategy called AdaFL that aims to achieve communication efficiency as well as a low error floor.
arXiv Detail & Related papers (2021-02-08T19:14:21Z) - CosSGD: Nonlinear Quantization for Communication-efficient Federated
Learning [62.65937719264881]
Federated learning facilitates learning across clients without transferring local data on these clients to a central server.
We propose a nonlinear quantization for compressed gradient descent, which can be easily utilized in federated learning.
Our system significantly reduces the communication cost by up to three orders of magnitude, while maintaining convergence and accuracy of the training process.
arXiv Detail & Related papers (2020-12-15T12:20:28Z) - Optimal Gradient Quantization Condition for Communication-Efficient
Distributed Training [99.42912552638168]
Communication of gradients is costly for training deep neural networks with multiple devices in computer vision applications.
In this work, we deduce the optimal condition of both the binary and multi-level gradient quantization for textbfANY gradient distribution.
Based on the optimal condition, we develop two novel quantization schemes: biased BinGrad and unbiased ORQ for binary and multi-level gradient quantization respectively.
arXiv Detail & Related papers (2020-02-25T18:28:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.