Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss
- URL: http://arxiv.org/abs/2109.02100v1
- Date: Sun, 5 Sep 2021 15:15:07 GMT
- Title: Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss
- Authors: Jung Hyun Lee, Jihun Yun, Sung Ju Hwang, Eunho Yang
- Abstract summary: Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
- Score: 61.26793005355441
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Network quantization, which aims to reduce the bit-lengths of the network
weights and activations, has emerged for their deployments to resource-limited
devices. Although recent studies have successfully discretized a full-precision
network, they still incur large quantization errors after training, thus giving
rise to a significant performance gap between a full-precision network and its
quantized counterpart. In this work, we propose a novel quantization method for
neural networks, Cluster-Promoting Quantization (CPQ) that finds the optimal
quantization grids while naturally encouraging the underlying full-precision
weights to gather around those quantization grids cohesively during training.
This property of CPQ is thanks to our two main ingredients that enable
differentiable quantization: i) the use of the categorical distribution
designed by a specific probabilistic parametrization in the forward pass and
ii) our proposed multi-class straight-through estimator (STE) in the backward
pass. Since our second component, multi-class STE, is intrinsically biased, we
additionally propose a new bit-drop technique, DropBits, that revises the
standard dropout regularization to randomly drop bits instead of neurons. As a
natural extension of DropBits, we further introduce the way of learning
heterogeneous quantization levels to find proper bit-length for each layer by
imposing an additional regularization on DropBits. We experimentally validate
our method on various benchmark datasets and network architectures, and also
support a new hypothesis for quantization: learning heterogeneous quantization
levels outperforms the case using the same but fixed quantization levels from
scratch.
Related papers
- SQUAT: Stateful Quantization-Aware Training in Recurrent Spiking Neural Networks [1.0923877073891446]
Spiking neural networks (SNNs) share the goal of enhancing efficiency, but adopt an 'event-driven' approach to reduce the power consumption of neural network inference.
This paper introduces two QAT schemes for stateful neurons: (i) a uniform quantization strategy, an established method for weight quantization, and (ii) threshold-centered quantization.
Our results show that increasing the density of quantization levels around the firing threshold improves accuracy across several benchmark datasets.
arXiv Detail & Related papers (2024-04-15T03:07:16Z) - MBQuant: A Novel Multi-Branch Topology Method for Arbitrary Bit-width Network Quantization [51.85834744835766]
We propose MBQuant, a novel method for arbitrary bit-width quantization.
We show that MBQuant achieves significant performance gains compared to existing arbitrary bit-width quantization methods.
arXiv Detail & Related papers (2023-05-14T10:17:09Z) - Post-training Quantization for Neural Networks with Provable Guarantees [9.58246628652846]
We modify a post-training neural-network quantization method, GPFQ, that is based on a greedy path-following mechanism.
We prove that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights.
arXiv Detail & Related papers (2022-01-26T18:47:38Z) - One Model for All Quantization: A Quantized Network Supporting Hot-Swap
Bit-Width Adjustment [36.75157407486302]
We propose a method to train a model for all quantization that supports diverse bit-widths.
We use wavelet decomposition and reconstruction to increase the diversity of weights.
Our method can achieve accuracy comparable to dedicated models trained at the same precision.
arXiv Detail & Related papers (2021-05-04T08:10:50Z) - Adaptive Quantization of Model Updates for Communication-Efficient
Federated Learning [75.45968495410047]
Communication of model updates between client nodes and the central aggregating server is a major bottleneck in federated learning.
Gradient quantization is an effective way of reducing the number of bits required to communicate each model update.
We propose an adaptive quantization strategy called AdaFL that aims to achieve communication efficiency as well as a low error floor.
arXiv Detail & Related papers (2021-02-08T19:14:21Z) - Direct Quantization for Training Highly Accurate Low Bit-width Deep
Neural Networks [73.29587731448345]
This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations.
First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights.
Second, to obtain low bit-width activations, existing works consider all channels equally.
arXiv Detail & Related papers (2020-12-26T15:21:18Z) - Recurrence of Optimum for Training Weight and Activation Quantized
Networks [4.103701929881022]
Training deep learning models with low-precision weights and activations involves a demanding optimization task.
We show how to overcome the nature of network quantization.
We also show numerical evidence of the recurrence phenomenon of weight evolution in training quantized deep networks.
arXiv Detail & Related papers (2020-12-10T09:14:43Z) - Searching for Low-Bit Weights in Quantized Neural Networks [129.8319019563356]
Quantized neural networks with low-bit weights and activations are attractive for developing AI accelerators.
We present to regard the discrete weights in an arbitrary quantized neural network as searchable variables, and utilize a differential method to search them accurately.
arXiv Detail & Related papers (2020-09-18T09:13:26Z) - Gradient $\ell_1$ Regularization for Quantization Robustness [70.39776106458858]
We derive a simple regularization scheme that improves robustness against post-training quantization.
By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths.
arXiv Detail & Related papers (2020-02-18T12:31:34Z) - Post-Training Piecewise Linear Quantization for Deep Neural Networks [13.717228230596167]
Quantization plays an important role in the energy-efficient deployment of deep neural networks on resource-limited devices.
We propose a piecewise linear quantization scheme to enable accurate approximation for tensor values that have bell-shaped distributions with long tails.
Compared to state-of-the-art post-training quantization methods, our proposed method achieves superior performance on image classification, semantic segmentation, and object detection with minor overhead.
arXiv Detail & Related papers (2020-01-31T23:47:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.