Related papers: Quantized Neural Networks: Characterization and Holistic Optimization

Quantized Neural Networks: Characterization and Holistic Optimization

URL: http://arxiv.org/abs/2006.00530v1
Date: Sun, 31 May 2020 14:20:27 GMT
Title: Quantized Neural Networks: Characterization and Holistic Optimization
Authors: Yoonho Boo, Sungho Shin, and Wonyong Sung
Abstract summary: Quantized deep neural networks (QDNNs) are necessary for low-power, high throughput, and embedded applications. This study proposes a holistic approach for the optimization of QDNNs, which contains QDNN training methods and quantization-friendly architecture design. The results indicate that deeper models are more prone to activation quantization, while wider models improve the resiliency to both weight and activation quantization.
Score: 25.970152258542672
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Quantized deep neural networks (QDNNs) are necessary for low-power, high throughput, and embedded applications. Previous studies mostly focused on developing optimization methods for the quantization of given models. However, quantization sensitivity depends on the model architecture. Therefore, the model selection needs to be a part of the QDNN design process. Also, the characteristics of weight and activation quantization are quite different. This study proposes a holistic approach for the optimization of QDNNs, which contains QDNN training methods as well as quantization-friendly architecture design. Synthesized data is used to visualize the effects of weight and activation quantization. The results indicate that deeper models are more prone to activation quantization, while wider models improve the resiliency to both weight and activation quantization. This study can provide insight into better optimization of QDNNs.

Related papers

SQUAT: Stateful Quantization-Aware Training in Recurrent Spiking Neural Networks [1.0923877073891446]
Spiking neural networks (SNNs) share the goal of enhancing efficiency, but adopt an 'event-driven' approach to reduce the power consumption of neural network inference. This paper introduces two QAT schemes for stateful neurons: (i) a uniform quantization strategy, an established method for weight quantization, and (ii) threshold-centered quantization. Our results show that increasing the density of quantization levels around the firing threshold improves accuracy across several benchmark datasets.
arXiv Detail & Related papers (2024-04-15T03:07:16Z)
Practical application of quantum neural network to materials informatics: prediction of the melting points of metal oxides [0.0]
Quantum neural network (QNN) models have received increasing attention owing to their strong expressibility and resistance to overfitting. This study aims to construct a QNN model to predict the melting points of metal oxides. Various architectures (encoding methods and entangler arrangements) are explored to create an effective QNN model.
arXiv Detail & Related papers (2023-10-27T07:21:36Z)
BiTAT: Neural Network Binarization with Task-dependent Aggregated Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation. Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration. This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z)
A Comprehensive Survey on Model Quantization for Deep Neural Networks in Image Classification [0.0]
A promising approach is quantization, in which the full-precision values are stored in low bit-width precision. We present a comprehensive survey of quantization concepts and methods, with a focus on image classification. We explain the replacement of floating-point operations with low-cost bitwise operations in a quantized DNN and the sensitivity of different layers in quantization.
arXiv Detail & Related papers (2022-05-14T15:08:32Z)
Low-bit Quantization of Recurrent Neural Network Language Models Using Alternating Direction Methods of Multipliers [67.688697838109]
This paper presents a novel method to train quantized RNNLMs from scratch using alternating direction methods of multipliers (ADMM) Experiments on two tasks suggest the proposed ADMM quantization achieved a model size compression factor of up to 31 times over the full precision baseline RNNLMs.
arXiv Detail & Related papers (2021-11-29T09:30:06Z)
ECQ$^{\text{x}}$: Explainability-Driven Quantization for Low-Bit and Sparse DNNs [13.446502051609036]
We develop and describe a novel quantization paradigm for deep neural networks (DNNs) Our method leverages concepts of explainable AI (XAI) and concepts of information theory. The ultimate goal is to preserve the most relevant weights in quantization clusters of highest information content.
arXiv Detail & Related papers (2021-09-09T12:57:06Z)
Where Should We Begin? A Low-Level Exploration of Weight Initialization Impact on Quantized Behaviour of Deep Neural Networks [93.4221402881609]
We present an in-depth, fine-grained ablation study of the effect of different weights initialization on the final distributions of weights and activations of different CNN architectures. To our best knowledge, we are the first to perform such a low-level, in-depth quantitative analysis of weights initialization and its effect on quantized behaviour.
arXiv Detail & Related papers (2020-11-30T06:54:28Z)
Once Quantization-Aware Training: High Performance Extremely Low-bit Architecture Search [112.05977301976613]
We propose to combine Network Architecture Search methods with quantization to enjoy the merits of the two sides. We first propose the joint training of architecture and quantization with a shared step size to acquire a large number of quantized models. Then a bit-inheritance scheme is introduced to transfer the quantized models to the lower bit, which further reduces the time cost and improves the quantization accuracy.
arXiv Detail & Related papers (2020-10-09T03:52:16Z)
Optimal Gradient Quantization Condition for Communication-Efficient Distributed Training [99.42912552638168]
Communication of gradients is costly for training deep neural networks with multiple devices in computer vision applications. In this work, we deduce the optimal condition of both the binary and multi-level gradient quantization for textbfANY gradient distribution. Based on the optimal condition, we develop two novel quantization schemes: biased BinGrad and unbiased ORQ for binary and multi-level gradient quantization respectively.
arXiv Detail & Related papers (2020-02-25T18:28:39Z)
Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters. Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques. We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
SQWA: Stochastic Quantized Weight Averaging for Improving the Generalization Capability of Low-Precision Deep Neural Networks [29.187848543158992]
We present a new quantized neural network optimization approach, quantized weight averaging (SQWA) The proposed approach includes floating-point model training, direct quantization of weights, capturing multiple low-precision models, averaging the captured models, and fine-tuning it with low-learning rates. With SQWA training, we achieved state-of-the-art results for 2-bit QDNNs on CIFAR-100 and ImageNet datasets.
arXiv Detail & Related papers (2020-02-02T07:02:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.