Filter Pre-Pruning for Improved Fine-tuning of Quantized Deep Neural
Networks
- URL: http://arxiv.org/abs/2011.06751v2
- Date: Wed, 25 Nov 2020 05:22:16 GMT
- Title: Filter Pre-Pruning for Improved Fine-tuning of Quantized Deep Neural
Networks
- Authors: Jun Nishikawa, Ryoji Ikegaya
- Abstract summary: We propose a new pruning method called Pruning for Quantization (PfQ) which removes the filters that disturb the fine-tuning of the DNN.
Experiments using well-known models and datasets confirmed that the proposed method achieves higher performance with a similar model size.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Neural Networks(DNNs) have many parameters and activation data, and
these both are expensive to implement. One method to reduce the size of the DNN
is to quantize the pre-trained model by using a low-bit expression for weights
and activations, using fine-tuning to recover the drop in accuracy. However, it
is generally difficult to train neural networks which use low-bit expressions.
One reason is that the weights in the middle layer of the DNN have a wide
dynamic range and so when quantizing the wide dynamic range into a few bits,
the step size becomes large, which leads to a large quantization error and
finally a large degradation in accuracy. To solve this problem, this paper
makes the following three contributions without using any additional learning
parameters and hyper-parameters. First, we analyze how batch normalization,
which causes the aforementioned problem, disturbs the fine-tuning of the
quantized DNN. Second, based on these results, we propose a new pruning method
called Pruning for Quantization (PfQ) which removes the filters that disturb
the fine-tuning of the DNN while not affecting the inferred result as far as
possible. Third, we propose a workflow of fine-tuning for quantized DNNs using
the proposed pruning method(PfQ). Experiments using well-known models and
datasets confirmed that the proposed method achieves higher performance with a
similar model size than conventional quantization methods including
fine-tuning.
Related papers
- Low-bit Quantization for Deep Graph Neural Networks with
Smoothness-aware Message Propagation [3.9177379733188715]
We present an end-to-end solution that aims to address these challenges for efficient GNNs in resource constrained environments.
We introduce a quantization based approach for all stages of GNNs, from message passing in training to node classification.
The proposed quantizer learns quantization ranges and reduces the model size with comparable accuracy even under low-bit quantization.
arXiv Detail & Related papers (2023-08-29T00:25:02Z) - Designing strong baselines for ternary neural network quantization
through support and mass equalization [7.971065005161565]
Deep neural networks (DNNs) offer the highest performance in a wide range of applications in computer vision.
This computational burden can be dramatically reduced by quantizing floating point values to ternary values.
We show experimentally that our approach allows to significantly improve the performance of ternary quantization through a variety of scenarios.
arXiv Detail & Related papers (2023-06-30T07:35:07Z) - QEBVerif: Quantization Error Bound Verification of Neural Networks [6.327780998441913]
quantization is widely regarded as one promising technique for deploying deep neural networks (DNNs) on edge devices.
Existing verification methods focus on either individual neural networks (DNNs or QNNs) or quantization error bound for partial quantization.
We propose a quantization error bound verification method, named QEBVerif, where both weights and activation tensors are quantized.
arXiv Detail & Related papers (2022-12-06T06:34:38Z) - Mixed Precision Low-bit Quantization of Neural Network Language Models
for Speech Recognition [67.95996816744251]
State-of-the-art language models (LMs) represented by long-short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming increasingly complex and expensive for practical applications.
Current quantization methods are based on uniform precision and fail to account for the varying performance sensitivity at different parts of LMs to quantization errors.
Novel mixed precision neural network LM quantization methods are proposed in this paper.
arXiv Detail & Related papers (2021-11-29T12:24:02Z) - Low-bit Quantization of Recurrent Neural Network Language Models Using
Alternating Direction Methods of Multipliers [67.688697838109]
This paper presents a novel method to train quantized RNNLMs from scratch using alternating direction methods of multipliers (ADMM)
Experiments on two tasks suggest the proposed ADMM quantization achieved a model size compression factor of up to 31 times over the full precision baseline RNNLMs.
arXiv Detail & Related papers (2021-11-29T09:30:06Z) - Direct Quantization for Training Highly Accurate Low Bit-width Deep
Neural Networks [73.29587731448345]
This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations.
First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights.
Second, to obtain low bit-width activations, existing works consider all channels equally.
arXiv Detail & Related papers (2020-12-26T15:21:18Z) - Where Should We Begin? A Low-Level Exploration of Weight Initialization
Impact on Quantized Behaviour of Deep Neural Networks [93.4221402881609]
We present an in-depth, fine-grained ablation study of the effect of different weights initialization on the final distributions of weights and activations of different CNN architectures.
To our best knowledge, we are the first to perform such a low-level, in-depth quantitative analysis of weights initialization and its effect on quantized behaviour.
arXiv Detail & Related papers (2020-11-30T06:54:28Z) - Holistic Filter Pruning for Efficient Deep Neural Networks [25.328005340524825]
"Holistic Filter Pruning" (HFP) is a novel approach for common DNN training that is easy to implement and enables to specify accurate pruning rates.
In various experiments, we give insights into the training and achieve state-of-the-art performance on CIFAR-10 and ImageNet.
arXiv Detail & Related papers (2020-09-17T09:23:36Z) - AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation.
Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z) - AUSN: Approximately Uniform Quantization by Adaptively Superimposing
Non-uniform Distribution for Deep Neural Networks [0.7378164273177589]
Existing uniform and non-uniform quantization methods exhibit an inherent conflict between the representing range and representing resolution.
We propose a novel quantization method to quantize the weight and activation.
The key idea is to Approximate the Uniform quantization by Adaptively Superposing multiple Non-uniform quantized values, namely AUSN.
arXiv Detail & Related papers (2020-07-08T05:10:53Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.