Related papers: Class-based Quantization for Neural Networks

Class-based Quantization for Neural Networks

URL: http://arxiv.org/abs/2211.14928v1
Date: Sun, 27 Nov 2022 20:25:46 GMT
Title: Class-based Quantization for Neural Networks
Authors: Wenhao Sun, Grace Li Zhang, Huaxi Gu, Bing Li, Ulf Schlichtmann
Abstract summary: In deep neural networks (DNNs), there are a huge number of weights and multiply-and-accumulate (MAC) operations. We propose a class-based quantization method to determine the minimum number of quantization bits for each filter or neuron in DNNs individually. Experimental results demonstrate that the proposed method can maintain the inference accuracy with low bit-width quantization.
Score: 6.6707634590249265
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In deep neural networks (DNNs), there are a huge number of weights and multiply-and-accumulate (MAC) operations. Accordingly, it is challenging to apply DNNs on resource-constrained platforms, e.g., mobile phones. Quantization is a method to reduce the size and the computational complexity of DNNs. Existing quantization methods either require hardware overhead to achieve a non-uniform quantization or focus on model-wise and layer-wise uniform quantization, which are not as fine-grained as filter-wise quantization. In this paper, we propose a class-based quantization method to determine the minimum number of quantization bits for each filter or neuron in DNNs individually. In the proposed method, the importance score of each filter or neuron with respect to the number of classes in the dataset is first evaluated. The larger the score is, the more important the filter or neuron is and thus the larger the number of quantization bits should be. Afterwards, a search algorithm is adopted to exploit the different importance of filters and neurons to determine the number of quantization bits of each filter or neuron. Experimental results demonstrate that the proposed method can maintain the inference accuracy with low bit-width quantization. Given the same number of quantization bits, the proposed method can also achieve a better inference accuracy than the existing methods.

Related papers

SQUAT: Stateful Quantization-Aware Training in Recurrent Spiking Neural Networks [1.0923877073891446]
Spiking neural networks (SNNs) share the goal of enhancing efficiency, but adopt an 'event-driven' approach to reduce the power consumption of neural network inference. This paper introduces two QAT schemes for stateful neurons: (i) a uniform quantization strategy, an established method for weight quantization, and (ii) threshold-centered quantization. Our results show that increasing the density of quantization levels around the firing threshold improves accuracy across several benchmark datasets.
arXiv Detail & Related papers (2024-04-15T03:07:16Z)
Towards Neural Variational Monte Carlo That Scales Linearly with System Size [67.09349921751341]
Quantum many-body problems are central to demystifying some exotic quantum phenomena, e.g., high-temperature superconductors. The combination of neural networks (NN) for representing quantum states, and the Variational Monte Carlo (VMC) algorithm, has been shown to be a promising method for solving such problems. We propose a NN architecture called Vector-Quantized Neural Quantum States (VQ-NQS) that utilizes vector-quantization techniques to leverage redundancies in the local-energy calculations of the VMC algorithm.
arXiv Detail & Related papers (2022-12-21T19:00:04Z)
A Comprehensive Survey on Model Quantization for Deep Neural Networks in Image Classification [0.0]
A promising approach is quantization, in which the full-precision values are stored in low bit-width precision. We present a comprehensive survey of quantization concepts and methods, with a focus on image classification. We explain the replacement of floating-point operations with low-cost bitwise operations in a quantized DNN and the sensitivity of different layers in quantization.
arXiv Detail & Related papers (2022-05-14T15:08:32Z)
Low-bit Quantization of Recurrent Neural Network Language Models Using Alternating Direction Methods of Multipliers [67.688697838109]
This paper presents a novel method to train quantized RNNLMs from scratch using alternating direction methods of multipliers (ADMM) Experiments on two tasks suggest the proposed ADMM quantization achieved a model size compression factor of up to 31 times over the full precision baseline RNNLMs.
arXiv Detail & Related papers (2021-11-29T09:30:06Z)
OMPQ: Orthogonal Mixed Precision Quantization [64.59700856607017]
Mixed precision quantization takes advantage of hardware's multiple bit-width arithmetic operations to unleash the full potential of network quantization. We propose to optimize a proxy metric, the concept of networkity, which is highly correlated with the loss of the integer programming. This approach reduces the search time and required data amount by orders of magnitude, with little compromise on quantization accuracy.
arXiv Detail & Related papers (2021-09-16T10:59:33Z)
Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks. DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons. We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z)
A Greedy Algorithm for Quantizing Neural Networks [4.683806391173103]
We propose a new computationally efficient method for quantizing the weights of pre- trained neural networks. Our method deterministically quantizes layers in an iterative fashion with no complicated re-training required.
arXiv Detail & Related papers (2020-10-29T22:53:10Z)
Searching for Low-Bit Weights in Quantized Neural Networks [129.8319019563356]
Quantized neural networks with low-bit weights and activations are attractive for developing AI accelerators. We present to regard the discrete weights in an arbitrary quantized neural network as searchable variables, and utilize a differential method to search them accurately.
arXiv Detail & Related papers (2020-09-18T09:13:26Z)
Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters. Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques. We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
Post-Training Piecewise Linear Quantization for Deep Neural Networks [13.717228230596167]
Quantization plays an important role in the energy-efficient deployment of deep neural networks on resource-limited devices. We propose a piecewise linear quantization scheme to enable accurate approximation for tensor values that have bell-shaped distributions with long tails. Compared to state-of-the-art post-training quantization methods, our proposed method achieves superior performance on image classification, semantic segmentation, and object detection with minor overhead.
arXiv Detail & Related papers (2020-01-31T23:47:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.