SQWA: Stochastic Quantized Weight Averaging for Improving the
Generalization Capability of Low-Precision Deep Neural Networks
- URL: http://arxiv.org/abs/2002.00343v1
- Date: Sun, 2 Feb 2020 07:02:51 GMT
- Title: SQWA: Stochastic Quantized Weight Averaging for Improving the
Generalization Capability of Low-Precision Deep Neural Networks
- Authors: Sungho Shin, Yoonho Boo, Wonyong Sung
- Abstract summary: We present a new quantized neural network optimization approach, quantized weight averaging (SQWA)
The proposed approach includes floating-point model training, direct quantization of weights, capturing multiple low-precision models, averaging the captured models, and fine-tuning it with low-learning rates.
With SQWA training, we achieved state-of-the-art results for 2-bit QDNNs on CIFAR-100 and ImageNet datasets.
- Score: 29.187848543158992
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Designing a deep neural network (DNN) with good generalization capability is
a complex process especially when the weights are severely quantized. Model
averaging is a promising approach for achieving the good generalization
capability of DNNs, especially when the loss surface for training contains many
sharp minima. We present a new quantized neural network optimization approach,
stochastic quantized weight averaging (SQWA), to design low-precision DNNs with
good generalization capability using model averaging. The proposed approach
includes (1) floating-point model training, (2) direct quantization of weights,
(3) capturing multiple low-precision models during retraining with cyclical
learning rates, (4) averaging the captured models, and (5) re-quantizing the
averaged model and fine-tuning it with low-learning rates. Additionally, we
present a loss-visualization technique on the quantized weight domain to
clearly elucidate the behavior of the proposed method. Visualization results
indicate that a quantized DNN (QDNN) optimized with the proposed approach is
located near the center of the flat minimum in the loss surface. With SQWA
training, we achieved state-of-the-art results for 2-bit QDNNs on CIFAR-100 and
ImageNet datasets. Although we only employed a uniform quantization scheme for
the sake of implementation in VLSI or low-precision neural processing units,
the performance achieved exceeded those of previous studies employing
non-uniform quantization.
Related papers
- SQUAT: Stateful Quantization-Aware Training in Recurrent Spiking Neural Networks [1.0923877073891446]
Spiking neural networks (SNNs) share the goal of enhancing efficiency, but adopt an 'event-driven' approach to reduce the power consumption of neural network inference.
This paper introduces two QAT schemes for stateful neurons: (i) a uniform quantization strategy, an established method for weight quantization, and (ii) threshold-centered quantization.
Our results show that increasing the density of quantization levels around the firing threshold improves accuracy across several benchmark datasets.
arXiv Detail & Related papers (2024-04-15T03:07:16Z) - BiTAT: Neural Network Binarization with Task-dependent Aggregated
Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation.
Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration.
This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z) - CTMQ: Cyclic Training of Convolutional Neural Networks with Multiple
Quantization Steps [1.3106063755117399]
This paper proposes a training method having multiple cyclic training for achieving enhanced performance in low-bit quantized convolutional neural networks (CNNs)
By using better training ability of the accurate model in an iterative manner, the proposed method can produce enhanced trained weights for the low-bit quantized model in each cycle.
Notably, the training method can advance Top-1 and Top-5 accuracies of the binarized ResNet-18 on the ImageNet dataset by 5.80% and 6.85%, respectively.
arXiv Detail & Related papers (2022-06-26T05:54:12Z) - Low-bit Quantization of Recurrent Neural Network Language Models Using
Alternating Direction Methods of Multipliers [67.688697838109]
This paper presents a novel method to train quantized RNNLMs from scratch using alternating direction methods of multipliers (ADMM)
Experiments on two tasks suggest the proposed ADMM quantization achieved a model size compression factor of up to 31 times over the full precision baseline RNNLMs.
arXiv Detail & Related papers (2021-11-29T09:30:06Z) - Learnable Companding Quantization for Accurate Low-bit Neural Networks [3.655021726150368]
Quantizing deep neural networks is an effective method for reducing memory consumption and improving inference speed.
It is still hard for extremely low-bit models to achieve accuracy comparable with that of full-precision models.
We propose learnable companding quantization (LCQ) as a novel non-uniform quantization method for 2-, 3-, and 4-bit models.
arXiv Detail & Related papers (2021-03-12T09:06:52Z) - Recurrence of Optimum for Training Weight and Activation Quantized
Networks [4.103701929881022]
Training deep learning models with low-precision weights and activations involves a demanding optimization task.
We show how to overcome the nature of network quantization.
We also show numerical evidence of the recurrence phenomenon of weight evolution in training quantized deep networks.
arXiv Detail & Related papers (2020-12-10T09:14:43Z) - Where Should We Begin? A Low-Level Exploration of Weight Initialization
Impact on Quantized Behaviour of Deep Neural Networks [93.4221402881609]
We present an in-depth, fine-grained ablation study of the effect of different weights initialization on the final distributions of weights and activations of different CNN architectures.
To our best knowledge, we are the first to perform such a low-level, in-depth quantitative analysis of weights initialization and its effect on quantized behaviour.
arXiv Detail & Related papers (2020-11-30T06:54:28Z) - Once Quantization-Aware Training: High Performance Extremely Low-bit
Architecture Search [112.05977301976613]
We propose to combine Network Architecture Search methods with quantization to enjoy the merits of the two sides.
We first propose the joint training of architecture and quantization with a shared step size to acquire a large number of quantized models.
Then a bit-inheritance scheme is introduced to transfer the quantized models to the lower bit, which further reduces the time cost and improves the quantization accuracy.
arXiv Detail & Related papers (2020-10-09T03:52:16Z) - Quantized Neural Networks: Characterization and Holistic Optimization [25.970152258542672]
Quantized deep neural networks (QDNNs) are necessary for low-power, high throughput, and embedded applications.
This study proposes a holistic approach for the optimization of QDNNs, which contains QDNN training methods and quantization-friendly architecture design.
The results indicate that deeper models are more prone to activation quantization, while wider models improve the resiliency to both weight and activation quantization.
arXiv Detail & Related papers (2020-05-31T14:20:27Z) - Optimal Gradient Quantization Condition for Communication-Efficient
Distributed Training [99.42912552638168]
Communication of gradients is costly for training deep neural networks with multiple devices in computer vision applications.
In this work, we deduce the optimal condition of both the binary and multi-level gradient quantization for textbfANY gradient distribution.
Based on the optimal condition, we develop two novel quantization schemes: biased BinGrad and unbiased ORQ for binary and multi-level gradient quantization respectively.
arXiv Detail & Related papers (2020-02-25T18:28:39Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.