Redistribution of Weights and Activations for AdderNet Quantization
- URL: http://arxiv.org/abs/2212.10200v1
- Date: Tue, 20 Dec 2022 12:24:48 GMT
- Title: Redistribution of Weights and Activations for AdderNet Quantization
- Authors: Ying Nie, Kai Han, Haikang Diao, Chuanjian Liu, Enhua Wu, Yunhe Wang
- Abstract summary: Adder Neural Network (AdderNet) provides a new way for developing energy-efficient neural networks.
To achieve higher hardware efficiency, it is necessary to further study the low-bit quantization of AdderNet.
We propose a new quantization algorithm by redistributing the weights and the activations.
- Score: 33.78204350112026
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adder Neural Network (AdderNet) provides a new way for developing
energy-efficient neural networks by replacing the expensive multiplications in
convolution with cheaper additions (i.e.l1-norm). To achieve higher hardware
efficiency, it is necessary to further study the low-bit quantization of
AdderNet. Due to the limitation that the commutative law in multiplication does
not hold in l1-norm, the well-established quantization methods on convolutional
networks cannot be applied on AdderNets. Thus, the existing AdderNet
quantization techniques propose to use only one shared scale to quantize both
the weights and activations simultaneously. Admittedly, such an approach can
keep the commutative law in the l1-norm quantization process, while the
accuracy drop after low-bit quantization cannot be ignored. To this end, we
first thoroughly analyze the difference on distributions of weights and
activations in AdderNet and then propose a new quantization algorithm by
redistributing the weights and the activations. Specifically, the pre-trained
full-precision weights in different kernels are clustered into different
groups, then the intra-group sharing and inter-group independent scales can be
adopted. To further compensate the accuracy drop caused by the distribution
difference, we then develop a lossless range clamp scheme for weights and a
simple yet effective outliers clamp strategy for activations. Thus, the
functionality of full-precision weights and the representation ability of
full-precision activations can be fully preserved. The effectiveness of the
proposed quantization method for AdderNet is well verified on several
benchmarks, e.g., our 4-bit post-training quantized adder ResNet-18 achieves an
66.5% top-1 accuracy on the ImageNet with comparable energy efficiency, which
is about 8.5% higher than that of the previous AdderNet quantization methods.
Related papers
- Post-Training Quantization for Re-parameterization via Coarse & Fine
Weight Splitting [13.270381125055275]
We propose a coarse & fine weight splitting (CFWS) method to reduce quantization error of weight.
We develop an improved KL metric to determine optimal quantization scales for activation.
For example, the quantized RepVGG-A1 model exhibits a mere 0.3% accuracy loss.
arXiv Detail & Related papers (2023-12-17T02:31:20Z) - BiTAT: Neural Network Binarization with Task-dependent Aggregated
Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation.
Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration.
This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - Adder Neural Networks [75.54239599016535]
We present adder networks (AdderNets) to trade massive multiplications in deep neural networks.
In AdderNets, we take the $ell_p$-norm distance between filters and input feature as the output response.
We show that the proposed AdderNets can achieve 75.7% Top-1 accuracy 92.3% Top-5 accuracy using ResNet-50 on the ImageNet dataset.
arXiv Detail & Related papers (2021-05-29T04:02:51Z) - Direct Quantization for Training Highly Accurate Low Bit-width Deep
Neural Networks [73.29587731448345]
This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations.
First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights.
Second, to obtain low bit-width activations, existing works consider all channels equally.
arXiv Detail & Related papers (2020-12-26T15:21:18Z) - Searching for Low-Bit Weights in Quantized Neural Networks [129.8319019563356]
Quantized neural networks with low-bit weights and activations are attractive for developing AI accelerators.
We present to regard the discrete weights in an arbitrary quantized neural network as searchable variables, and utilize a differential method to search them accurately.
arXiv Detail & Related papers (2020-09-18T09:13:26Z) - Weight Equalizing Shift Scaler-Coupled Post-training Quantization [0.5936318628878774]
Post-training, layer-wise quantization is preferable because it is free from retraining and is hardware-friendly.
accuracy degradation has occurred when a neural network model has a big difference of per-out-channel weight ranges.
We propose a new weight equalizing shift scaler, i.e. rescaling the weight range per channel by a 4-bit binary shift, prior to a layer-wise quantization.
arXiv Detail & Related papers (2020-08-13T09:19:57Z) - Switchable Precision Neural Networks [35.2752928147013]
Switchable Precision neural Networks (SP-Nets) are proposed to train a shared network capable of operating at multiple quantization levels.
At runtime, the network can adjust its precision on the fly according to instant memory, latency, power consumption and accuracy demands.
arXiv Detail & Related papers (2020-02-07T14:43:44Z) - AdderNet: Do We Really Need Multiplications in Deep Learning? [159.174891462064]
We present adder networks (AdderNets) to trade massive multiplications in deep neural networks for much cheaper additions to reduce computation costs.
We develop a special back-propagation approach for AdderNets by investigating the full-precision gradient.
As a result, the proposed AdderNets can achieve 74.9% Top-1 accuracy 91.7% Top-5 accuracy using ResNet-50 on the ImageNet dataset.
arXiv Detail & Related papers (2019-12-31T06:56:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.