Optimal Gradient Quantization Condition for Communication-Efficient
Distributed Training
- URL: http://arxiv.org/abs/2002.11082v1
- Date: Tue, 25 Feb 2020 18:28:39 GMT
- Title: Optimal Gradient Quantization Condition for Communication-Efficient
Distributed Training
- Authors: An Xu, Zhouyuan Huo, Heng Huang
- Abstract summary: Communication of gradients is costly for training deep neural networks with multiple devices in computer vision applications.
In this work, we deduce the optimal condition of both the binary and multi-level gradient quantization for textbfANY gradient distribution.
Based on the optimal condition, we develop two novel quantization schemes: biased BinGrad and unbiased ORQ for binary and multi-level gradient quantization respectively.
- Score: 99.42912552638168
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The communication of gradients is costly for training deep neural networks
with multiple devices in computer vision applications. In particular, the
growing size of deep learning models leads to higher communication overheads
that defy the ideal linear training speedup regarding the number of devices.
Gradient quantization is one of the common methods to reduce communication
costs. However, it can lead to quantization error in the training and result in
model performance degradation. In this work, we deduce the optimal condition of
both the binary and multi-level gradient quantization for \textbf{ANY} gradient
distribution. Based on the optimal condition, we develop two novel quantization
schemes: biased BinGrad and unbiased ORQ for binary and multi-level gradient
quantization respectively, which dynamically determine the optimal quantization
levels. Extensive experimental results on CIFAR and ImageNet datasets with
several popular convolutional neural networks show the superiority of our
proposed methods.
Related papers
- The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Neural Networks with Quantization Constraints [111.42313650830248]
We present a constrained learning approach to quantization training.
We show that the resulting problem is strongly dual and does away with gradient estimations.
We demonstrate that the proposed approach exhibits competitive performance in image classification tasks.
arXiv Detail & Related papers (2022-10-27T17:12:48Z) - Combinatorial optimization for low bit-width neural networks [23.466606660363016]
Low-bit width neural networks have been extensively explored for deployment on edge devices to reduce computational resources.
Existing approaches have focused on gradient-based optimization in a two-stage train-and-compress setting.
We show that a combination of greedy coordinate descent and this novel approach can attain competitive accuracy on binary classification tasks.
arXiv Detail & Related papers (2022-06-04T15:02:36Z) - FedDQ: Communication-Efficient Federated Learning with Descending
Quantization [5.881154276623056]
Federated learning (FL) is an emerging privacy-preserving distributed learning scheme.
FL suffers from critical communication bottleneck due to large model size and frequent model aggregation.
This paper proposes an opposite approach to do adaptive quantization.
arXiv Detail & Related papers (2021-10-05T18:56:28Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - In-Hindsight Quantization Range Estimation for Quantized Training [5.65658124285176]
We propose a simple alternative to dynamic quantization, in-hindsight range estimation, that uses the quantization ranges estimated on previous iterations to quantize the present.
Our approach enables fast static quantization of gradients and activations while requiring only minimal hardware support from the neural network accelerator.
It is intended as a drop-in replacement for estimating quantization ranges and can be used in conjunction with other advances in quantized training.
arXiv Detail & Related papers (2021-05-10T10:25:28Z) - Distribution Adaptive INT8 Quantization for Training CNNs [12.708068468737286]
In this paper, we propose a novel INT8 quantization training framework for convolutional neural network.
Specifically, we adopt Gradient Vectorized Quantization to quantize the gradient, based on the observation that layer-wise gradients contain multiple distributions along the channel dimension.
Then, Magnitude-aware Clipping Strategy is introduced by taking the magnitudes of gradients into consideration when minimizing the quantization error.
arXiv Detail & Related papers (2021-02-09T11:58:10Z) - Adaptive Quantization of Model Updates for Communication-Efficient
Federated Learning [75.45968495410047]
Communication of model updates between client nodes and the central aggregating server is a major bottleneck in federated learning.
Gradient quantization is an effective way of reducing the number of bits required to communicate each model update.
We propose an adaptive quantization strategy called AdaFL that aims to achieve communication efficiency as well as a low error floor.
arXiv Detail & Related papers (2021-02-08T19:14:21Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.