Related papers: PTQ-SL: Exploring the Sub-layerwise Post-training Quantization

PTQ-SL: Exploring the Sub-layerwise Post-training Quantization

URL: http://arxiv.org/abs/2110.07809v2
Date: Mon, 18 Oct 2021 00:42:16 GMT
Title: PTQ-SL: Exploring the Sub-layerwise Post-training Quantization
Authors: Zhihang Yuan, Yiqi Chen, Chenhao Xue, Chenguang Zhang, Qiankun Wang, Guangyu Sun
Abstract summary: Network quantization is a powerful technique to compress convolutional neural networks. The quantization granularity determines how to share the scaling factors in weights. We propose an efficient post-training quantization method in sub-layerwise granularity (PTQ-SL)
Score: 6.0070278366995105
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Network quantization is a powerful technique to compress convolutional neural networks. The quantization granularity determines how to share the scaling factors in weights, which affects the performance of network quantization. Most existing approaches share the scaling factors layerwisely or channelwisely for quantization of convolutional layers. Channelwise quantization and layerwise quantization have been widely used in various applications. However, other quantization granularities are rarely explored. In this paper, we will explore the sub-layerwise granularity that shares the scaling factor across multiple input and output channels. We propose an efficient post-training quantization method in sub-layerwise granularity (PTQ-SL). Then we systematically experiment on various granularities and observe that the prediction accuracy of the quantized neural network has a strong correlation with the granularity. Moreover, we find that adjusting the position of the channels can improve the performance of sub-layerwise quantization. Therefore, we propose a method to reorder the channels for sub-layerwise quantization. The experiments demonstrate that the sub-layerwise quantization with appropriate channel reordering can outperform the channelwise quantization.

Related papers

Ternary Quantization: A Survey [12.90416661059601]
Inference time, model size, and accuracy are critical for deploying deep neural network models. We review the evolution of ternary quantization and investigate the relationships among existing ternary quantization methods.
arXiv Detail & Related papers (2023-03-02T03:38:51Z)
Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks. DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons. We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z)
An unsupervised feature learning for quantum-classical convolutional network with applications to fault detection [5.609958919699706]
We present a simple unsupervised method for quantum-classical convolutional networks to learn a hierarchy of quantum feature extractors. The main contribution of the proposed approach is to use the $K$-means clustering to maximize the difference of quantum properties in quantum circuit ansatz.
arXiv Detail & Related papers (2021-07-17T03:16:59Z)
Direct Quantization for Training Highly Accurate Low Bit-width Deep Neural Networks [73.29587731448345]
This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations. First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights. Second, to obtain low bit-width activations, existing works consider all channels equally.
arXiv Detail & Related papers (2020-12-26T15:21:18Z)
Where Should We Begin? A Low-Level Exploration of Weight Initialization Impact on Quantized Behaviour of Deep Neural Networks [93.4221402881609]
We present an in-depth, fine-grained ablation study of the effect of different weights initialization on the final distributions of weights and activations of different CNN architectures. To our best knowledge, we are the first to perform such a low-level, in-depth quantitative analysis of weights initialization and its effect on quantized behaviour.
arXiv Detail & Related papers (2020-11-30T06:54:28Z)
Toward Trainability of Quantum Neural Networks [87.04438831673063]
Quantum Neural Networks (QNNs) have been proposed as generalizations of classical neural networks to achieve the quantum speed-up. Serious bottlenecks exist for training QNNs due to the vanishing with gradient rate exponential to the input qubit number. We show that QNNs with tree tensor and step controlled structures for the application of binary classification. Simulations show faster convergent rates and better accuracy compared to QNNs with random structures.
arXiv Detail & Related papers (2020-11-12T08:32:04Z)
A Greedy Algorithm for Quantizing Neural Networks [4.683806391173103]
We propose a new computationally efficient method for quantizing the weights of pre- trained neural networks. Our method deterministically quantizes layers in an iterative fashion with no complicated re-training required.
arXiv Detail & Related papers (2020-10-29T22:53:10Z)
Channel-Level Variable Quantization Network for Deep Image Compression [50.3174629451739]
We propose a channel-level variable quantization network to dynamically allocate more convolutions for significant channels and withdraws for negligible channels. Our method achieves superior performance and can produce much better visual reconstructions.
arXiv Detail & Related papers (2020-07-15T07:20:39Z)
Trainability of Dissipative Perceptron-Based Quantum Neural Networks [0.8258451067861933]
We analyze the gradient scaling (and hence the trainability) for a recently proposed architecture that we called dissipative QNNs (DQNNs) We find that DQNNs can exhibit barren plateaus, i.e., gradients that vanish exponentially in the number of qubits.
arXiv Detail & Related papers (2020-05-26T00:59:09Z)
Post-Training Piecewise Linear Quantization for Deep Neural Networks [13.717228230596167]
Quantization plays an important role in the energy-efficient deployment of deep neural networks on resource-limited devices. We propose a piecewise linear quantization scheme to enable accurate approximation for tensor values that have bell-shaped distributions with long tails. Compared to state-of-the-art post-training quantization methods, our proposed method achieves superior performance on image classification, semantic segmentation, and object detection with minor overhead.
arXiv Detail & Related papers (2020-01-31T23:47:00Z)
Entanglement Classification via Neural Network Quantum States [58.720142291102135]
In this paper we combine machine-learning tools and the theory of quantum entanglement to perform entanglement classification for multipartite qubit systems in pure states. We use a parameterisation of quantum systems using artificial neural networks in a restricted Boltzmann machine (RBM) architecture, known as Neural Network Quantum States (NNS)
arXiv Detail & Related papers (2019-12-31T07:40:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.