Post-training Quantization for Neural Networks with Provable Guarantees
- URL: http://arxiv.org/abs/2201.11113v1
- Date: Wed, 26 Jan 2022 18:47:38 GMT
- Title: Post-training Quantization for Neural Networks with Provable Guarantees
- Authors: Jinjie Zhang, Yixuan Zhou, Rayan Saab
- Abstract summary: We modify a post-training neural-network quantization method, GPFQ, that is based on a greedy path-following mechanism.
We prove that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights.
- Score: 9.58246628652846
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While neural networks have been remarkably successful in a wide array of
applications, implementing them in resource-constrained hardware remains an
area of intense research. By replacing the weights of a neural network with
quantized (e.g., 4-bit, or binary) counterparts, massive savings in computation
cost, memory, and power consumption are attained. We modify a post-training
neural-network quantization method, GPFQ, that is based on a greedy
path-following mechanism, and rigorously analyze its error. We prove that for
quantizing a single-layer network, the relative square error essentially decays
linearly in the number of weights -- i.e., level of over-parametrization. Our
result holds across a range of input distributions and for both fully-connected
and convolutional architectures. To empirically evaluate the method, we
quantize several common architectures with few bits per weight, and test them
on ImageNet, showing only minor loss of accuracy. We also demonstrate that
standard modifications, such as bias correction and mixed precision
quantization, further improve accuracy.
Related papers
- Post-Training Quantization for Re-parameterization via Coarse & Fine
Weight Splitting [13.270381125055275]
We propose a coarse & fine weight splitting (CFWS) method to reduce quantization error of weight.
We develop an improved KL metric to determine optimal quantization scales for activation.
For example, the quantized RepVGG-A1 model exhibits a mere 0.3% accuracy loss.
arXiv Detail & Related papers (2023-12-17T02:31:20Z) - SPFQ: A Stochastic Algorithm and Its Error Analysis for Neural Network
Quantization [5.982922468400901]
We show that it is possible to achieve error bounds equivalent to that obtained in the order of the weights of a neural layer.
We prove that it is possible to achieve full-network bounds under an infinite alphabet and minimal assumptions on the input data.
arXiv Detail & Related papers (2023-09-20T00:35:16Z) - QEBVerif: Quantization Error Bound Verification of Neural Networks [6.327780998441913]
quantization is widely regarded as one promising technique for deploying deep neural networks (DNNs) on edge devices.
Existing verification methods focus on either individual neural networks (DNNs or QNNs) or quantization error bound for partial quantization.
We propose a quantization error bound verification method, named QEBVerif, where both weights and activation tensors are quantized.
arXiv Detail & Related papers (2022-12-06T06:34:38Z) - Quantization-aware Interval Bound Propagation for Training Certifiably
Robust Quantized Neural Networks [58.195261590442406]
We study the problem of training and certifying adversarially robust quantized neural networks (QNNs)
Recent work has shown that floating-point neural networks that have been verified to be robust can become vulnerable to adversarial attacks after quantization.
We present quantization-aware interval bound propagation (QA-IBP), a novel method for training robust QNNs.
arXiv Detail & Related papers (2022-11-29T13:32:38Z) - A simple approach for quantizing neural networks [7.056222499095849]
We propose a new method for quantizing the weights of a fully trained neural network.
A simple deterministic pre-processing step allows us to quantize network layers via memoryless scalar quantization.
The developed method also readily allows the quantization of deep networks by consecutive application to single layers.
arXiv Detail & Related papers (2022-09-07T22:36:56Z) - BiTAT: Neural Network Binarization with Task-dependent Aggregated
Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation.
Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration.
This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - Direct Quantization for Training Highly Accurate Low Bit-width Deep
Neural Networks [73.29587731448345]
This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations.
First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights.
Second, to obtain low bit-width activations, existing works consider all channels equally.
arXiv Detail & Related papers (2020-12-26T15:21:18Z) - A Greedy Algorithm for Quantizing Neural Networks [4.683806391173103]
We propose a new computationally efficient method for quantizing the weights of pre- trained neural networks.
Our method deterministically quantizes layers in an iterative fashion with no complicated re-training required.
arXiv Detail & Related papers (2020-10-29T22:53:10Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.