Training Multi-bit Quantized and Binarized Networks with A Learnable
Symmetric Quantizer
- URL: http://arxiv.org/abs/2104.00210v1
- Date: Thu, 1 Apr 2021 02:33:31 GMT
- Title: Training Multi-bit Quantized and Binarized Networks with A Learnable
Symmetric Quantizer
- Authors: Phuoc Pham, Jacob Abraham, Jaeyong Chung
- Abstract summary: Quantizing weights and activations of deep neural networks is essential for deploying them in resource-constrained devices or cloud platforms.
While binarization is a special case of quantization, this extreme case often leads to several training difficulties.
We develop a unified quantization framework, denoted as UniQ, to overcome binarization difficulties.
- Score: 1.9659095632676098
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Quantizing weights and activations of deep neural networks is essential for
deploying them in resource-constrained devices, or cloud platforms for at-scale
services. While binarization is a special case of quantization, this extreme
case often leads to several training difficulties, and necessitates specialized
models and training methods. As a result, recent quantization methods do not
provide binarization, thus losing the most resource-efficient option, and
quantized and binarized networks have been distinct research areas. We examine
binarization difficulties in a quantization framework and find that all we need
to enable the binary training are a symmetric quantizer, good initialization,
and careful hyperparameter selection. These techniques also lead to substantial
improvements in multi-bit quantization. We demonstrate our unified quantization
framework, denoted as UniQ, on the ImageNet dataset with various architectures
such as ResNet-18,-34 and MobileNetV2. For multi-bit quantization, UniQ
outperforms existing methods to achieve the state-of-the-art accuracy. In
binarization, the achieved accuracy is comparable to existing state-of-the-art
methods even without modifying the original architectures.
Related papers
- Vertical Layering of Quantized Neural Networks for Heterogeneous
Inference [57.42762335081385]
We study a new vertical-layered representation of neural network weights for encapsulating all quantized models into a single one.
We can theoretically achieve any precision network for on-demand service while only needing to train and maintain one model.
arXiv Detail & Related papers (2022-12-10T15:57:38Z) - Quantization-aware Interval Bound Propagation for Training Certifiably
Robust Quantized Neural Networks [58.195261590442406]
We study the problem of training and certifying adversarially robust quantized neural networks (QNNs)
Recent work has shown that floating-point neural networks that have been verified to be robust can become vulnerable to adversarial attacks after quantization.
We present quantization-aware interval bound propagation (QA-IBP), a novel method for training robust QNNs.
arXiv Detail & Related papers (2022-11-29T13:32:38Z) - A Comprehensive Survey on Model Quantization for Deep Neural Networks in
Image Classification [0.0]
A promising approach is quantization, in which the full-precision values are stored in low bit-width precision.
We present a comprehensive survey of quantization concepts and methods, with a focus on image classification.
We explain the replacement of floating-point operations with low-cost bitwise operations in a quantized DNN and the sensitivity of different layers in quantization.
arXiv Detail & Related papers (2022-05-14T15:08:32Z) - Post-training Quantization for Neural Networks with Provable Guarantees [9.58246628652846]
We modify a post-training neural-network quantization method, GPFQ, that is based on a greedy path-following mechanism.
We prove that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights.
arXiv Detail & Related papers (2022-01-26T18:47:38Z) - HPTQ: Hardware-Friendly Post Training Quantization [6.515659231669797]
We introduce a hardware-friendly post training quantization (HPTQ) framework.
We perform a large-scale study on four tasks: classification, object detection, semantic segmentation and pose estimation.
Our experiments show that competitive results can be obtained under hardware-friendly constraints.
arXiv Detail & Related papers (2021-09-19T12:45:01Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - Post-Training Quantization for Vision Transformer [85.57953732941101]
We present an effective post-training quantization algorithm for reducing the memory storage and computational costs of vision transformers.
We can obtain an 81.29% top-1 accuracy using DeiT-B model on ImageNet dataset with about 8-bit quantization.
arXiv Detail & Related papers (2021-06-27T06:27:22Z) - One Model for All Quantization: A Quantized Network Supporting Hot-Swap
Bit-Width Adjustment [36.75157407486302]
We propose a method to train a model for all quantization that supports diverse bit-widths.
We use wavelet decomposition and reconstruction to increase the diversity of weights.
Our method can achieve accuracy comparable to dedicated models trained at the same precision.
arXiv Detail & Related papers (2021-05-04T08:10:50Z) - Recurrence of Optimum for Training Weight and Activation Quantized
Networks [4.103701929881022]
Training deep learning models with low-precision weights and activations involves a demanding optimization task.
We show how to overcome the nature of network quantization.
We also show numerical evidence of the recurrence phenomenon of weight evolution in training quantized deep networks.
arXiv Detail & Related papers (2020-12-10T09:14:43Z) - Once Quantization-Aware Training: High Performance Extremely Low-bit
Architecture Search [112.05977301976613]
We propose to combine Network Architecture Search methods with quantization to enjoy the merits of the two sides.
We first propose the joint training of architecture and quantization with a shared step size to acquire a large number of quantized models.
Then a bit-inheritance scheme is introduced to transfer the quantized models to the lower bit, which further reduces the time cost and improves the quantization accuracy.
arXiv Detail & Related papers (2020-10-09T03:52:16Z) - Optimal Gradient Quantization Condition for Communication-Efficient
Distributed Training [99.42912552638168]
Communication of gradients is costly for training deep neural networks with multiple devices in computer vision applications.
In this work, we deduce the optimal condition of both the binary and multi-level gradient quantization for textbfANY gradient distribution.
Based on the optimal condition, we develop two novel quantization schemes: biased BinGrad and unbiased ORQ for binary and multi-level gradient quantization respectively.
arXiv Detail & Related papers (2020-02-25T18:28:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.