Standard Deviation-Based Quantization for Deep Neural Networks
- URL: http://arxiv.org/abs/2202.12422v1
- Date: Thu, 24 Feb 2022 23:33:47 GMT
- Title: Standard Deviation-Based Quantization for Deep Neural Networks
- Authors: Amir Ardakani, Arash Ardakani, Brett Meyer, James J. Clark, Warren J.
Gross
- Abstract summary: Quantization of deep neural networks is a promising approach that reduces the inference cost.
We propose a new framework to learn the quantization intervals (discrete values) using the knowledge of the network's weight and activation distributions.
Our scheme simultaneously prunes the network's parameters and allows us to flexibly adjust the pruning ratio during the quantization process.
- Score: 17.495852096822894
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Quantization of deep neural networks is a promising approach that reduces the
inference cost, making it feasible to run deep networks on resource-restricted
devices. Inspired by existing methods, we propose a new framework to learn the
quantization intervals (discrete values) using the knowledge of the network's
weight and activation distributions, i.e., standard deviation. Furthermore, we
propose a novel base-2 logarithmic quantization scheme to quantize weights to
power-of-two discrete values. Our proposed scheme allows us to replace
resource-hungry high-precision multipliers with simple shift-add operations.
According to our evaluations, our method outperforms existing work on CIFAR10
and ImageNet datasets and even achieves better accuracy performance with 3-bit
weights and activations when compared to the full-precision models. Moreover,
our scheme simultaneously prunes the network's parameters and allows us to
flexibly adjust the pruning ratio during the quantization process.
Related papers
- Post-Training Quantization for Re-parameterization via Coarse & Fine
Weight Splitting [13.270381125055275]
We propose a coarse & fine weight splitting (CFWS) method to reduce quantization error of weight.
We develop an improved KL metric to determine optimal quantization scales for activation.
For example, the quantized RepVGG-A1 model exhibits a mere 0.3% accuracy loss.
arXiv Detail & Related papers (2023-12-17T02:31:20Z) - Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth
Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs.
We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z) - BiTAT: Neural Network Binarization with Task-dependent Aggregated
Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation.
Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration.
This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z) - Post-training Quantization for Neural Networks with Provable Guarantees [9.58246628652846]
We modify a post-training neural-network quantization method, GPFQ, that is based on a greedy path-following mechanism.
We prove that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights.
arXiv Detail & Related papers (2022-01-26T18:47:38Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - Direct Quantization for Training Highly Accurate Low Bit-width Deep
Neural Networks [73.29587731448345]
This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations.
First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights.
Second, to obtain low bit-width activations, existing works consider all channels equally.
arXiv Detail & Related papers (2020-12-26T15:21:18Z) - Searching for Low-Bit Weights in Quantized Neural Networks [129.8319019563356]
Quantized neural networks with low-bit weights and activations are attractive for developing AI accelerators.
We present to regard the discrete weights in an arbitrary quantized neural network as searchable variables, and utilize a differential method to search them accurately.
arXiv Detail & Related papers (2020-09-18T09:13:26Z) - Optimal Quantization for Batch Normalization in Neural Network
Deployments and Beyond [18.14282813812512]
Batch Normalization (BN) poses a challenge for Quantized Neural Networks (QNNs)
We propose a novel method to quantize BN by converting an affine transformation of two floating points to a fixed-point operation with shared quantized scale.
Our method is verified by experiments at layer level on CIFAR and ImageNet datasets.
arXiv Detail & Related papers (2020-08-30T09:33:29Z) - Gradient $\ell_1$ Regularization for Quantization Robustness [70.39776106458858]
We derive a simple regularization scheme that improves robustness against post-training quantization.
By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths.
arXiv Detail & Related papers (2020-02-18T12:31:34Z) - Switchable Precision Neural Networks [35.2752928147013]
Switchable Precision neural Networks (SP-Nets) are proposed to train a shared network capable of operating at multiple quantization levels.
At runtime, the network can adjust its precision on the fly according to instant memory, latency, power consumption and accuracy demands.
arXiv Detail & Related papers (2020-02-07T14:43:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.