CTMQ: Cyclic Training of Convolutional Neural Networks with Multiple
Quantization Steps
- URL: http://arxiv.org/abs/2206.12794v1
- Date: Sun, 26 Jun 2022 05:54:12 GMT
- Title: CTMQ: Cyclic Training of Convolutional Neural Networks with Multiple
Quantization Steps
- Authors: HyunJin Kim, Jungwoo Shin, Alberto A. Del Barrio
- Abstract summary: This paper proposes a training method having multiple cyclic training for achieving enhanced performance in low-bit quantized convolutional neural networks (CNNs)
By using better training ability of the accurate model in an iterative manner, the proposed method can produce enhanced trained weights for the low-bit quantized model in each cycle.
Notably, the training method can advance Top-1 and Top-5 accuracies of the binarized ResNet-18 on the ImageNet dataset by 5.80% and 6.85%, respectively.
- Score: 1.3106063755117399
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes a training method having multiple cyclic training for
achieving enhanced performance in low-bit quantized convolutional neural
networks (CNNs). Quantization is a popular method for obtaining lightweight
CNNs, where the initialization with a pretrained model is widely used to
overcome degraded performance in low-resolution quantization. However, large
quantization errors between real values and their low-bit quantized ones cause
difficulties in achieving acceptable performance for complex networks and large
datasets. The proposed training method softly delivers the knowledge of
pretrained models to low-bit quantized models in multiple quantization steps.
In each quantization step, the trained weights of a model are used to
initialize the weights of the next model with the quantization bit depth
reduced by one. With small change of the quantization bit depth, the
performance gap can be bridged, thus providing better weight initialization. In
cyclic training, after training a low-bit quantized model, its trained weights
are used in the initialization of its accurate model to be trained. By using
better training ability of the accurate model in an iterative manner, the
proposed method can produce enhanced trained weights for the low-bit quantized
model in each cycle. Notably, the training method can advance Top-1 and Top-5
accuracies of the binarized ResNet-18 on the ImageNet dataset by 5.80% and
6.85%, respectively.
Related papers
- LLM-QAT: Data-Free Quantization Aware Training for Large Language Models [38.76165207636793]
We propose a data-free distillation method that leverages generations produced by the pre-trained model.
In addition to quantizing weights and activations, we also quantize the KV cache, which is critical for increasing throughput.
We experiment with LLaMA models of sizes 7B, 13B, and 30B, at quantization levels down to 4-bits.
arXiv Detail & Related papers (2023-05-29T05:22:11Z) - Vertical Layering of Quantized Neural Networks for Heterogeneous
Inference [57.42762335081385]
We study a new vertical-layered representation of neural network weights for encapsulating all quantized models into a single one.
We can theoretically achieve any precision network for on-demand service while only needing to train and maintain one model.
arXiv Detail & Related papers (2022-12-10T15:57:38Z) - BiTAT: Neural Network Binarization with Task-dependent Aggregated
Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation.
Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration.
This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z) - Low-bit Quantization of Recurrent Neural Network Language Models Using
Alternating Direction Methods of Multipliers [67.688697838109]
This paper presents a novel method to train quantized RNNLMs from scratch using alternating direction methods of multipliers (ADMM)
Experiments on two tasks suggest the proposed ADMM quantization achieved a model size compression factor of up to 31 times over the full precision baseline RNNLMs.
arXiv Detail & Related papers (2021-11-29T09:30:06Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - One Model for All Quantization: A Quantized Network Supporting Hot-Swap
Bit-Width Adjustment [36.75157407486302]
We propose a method to train a model for all quantization that supports diverse bit-widths.
We use wavelet decomposition and reconstruction to increase the diversity of weights.
Our method can achieve accuracy comparable to dedicated models trained at the same precision.
arXiv Detail & Related papers (2021-05-04T08:10:50Z) - Learnable Companding Quantization for Accurate Low-bit Neural Networks [3.655021726150368]
Quantizing deep neural networks is an effective method for reducing memory consumption and improving inference speed.
It is still hard for extremely low-bit models to achieve accuracy comparable with that of full-precision models.
We propose learnable companding quantization (LCQ) as a novel non-uniform quantization method for 2-, 3-, and 4-bit models.
arXiv Detail & Related papers (2021-03-12T09:06:52Z) - Direct Quantization for Training Highly Accurate Low Bit-width Deep
Neural Networks [73.29587731448345]
This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations.
First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights.
Second, to obtain low bit-width activations, existing works consider all channels equally.
arXiv Detail & Related papers (2020-12-26T15:21:18Z) - Recurrence of Optimum for Training Weight and Activation Quantized
Networks [4.103701929881022]
Training deep learning models with low-precision weights and activations involves a demanding optimization task.
We show how to overcome the nature of network quantization.
We also show numerical evidence of the recurrence phenomenon of weight evolution in training quantized deep networks.
arXiv Detail & Related papers (2020-12-10T09:14:43Z) - Gradient $\ell_1$ Regularization for Quantization Robustness [70.39776106458858]
We derive a simple regularization scheme that improves robustness against post-training quantization.
By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths.
arXiv Detail & Related papers (2020-02-18T12:31:34Z) - SQWA: Stochastic Quantized Weight Averaging for Improving the
Generalization Capability of Low-Precision Deep Neural Networks [29.187848543158992]
We present a new quantized neural network optimization approach, quantized weight averaging (SQWA)
The proposed approach includes floating-point model training, direct quantization of weights, capturing multiple low-precision models, averaging the captured models, and fine-tuning it with low-learning rates.
With SQWA training, we achieved state-of-the-art results for 2-bit QDNNs on CIFAR-100 and ImageNet datasets.
arXiv Detail & Related papers (2020-02-02T07:02:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.