Edge Inference with Fully Differentiable Quantized Mixed Precision
Neural Networks
- URL: http://arxiv.org/abs/2206.07741v2
- Date: Tue, 29 Aug 2023 21:33:12 GMT
- Title: Edge Inference with Fully Differentiable Quantized Mixed Precision
Neural Networks
- Authors: Clemens JS Schaefer, Siddharth Joshi, Shan Li, Raul Blazquez
- Abstract summary: Quantizing parameters and operations to lower bit-precision offers substantial memory and energy savings for neural network inference.
This paper proposes a new quantization approach for mixed precision convolutional neural networks (CNNs) targeting edge-computing.
- Score: 1.131071436917293
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The large computing and memory cost of deep neural networks (DNNs) often
precludes their use in resource-constrained devices. Quantizing the parameters
and operations to lower bit-precision offers substantial memory and energy
savings for neural network inference, facilitating the use of DNNs on edge
computing platforms. Recent efforts at quantizing DNNs have employed a range of
techniques encompassing progressive quantization, step-size adaptation, and
gradient scaling. This paper proposes a new quantization approach for mixed
precision convolutional neural networks (CNNs) targeting edge-computing. Our
method establishes a new pareto frontier in model accuracy and memory footprint
demonstrating a range of quantized models, delivering best-in-class accuracy
below 4.3 MB of weights (wgts.) and activations (acts.). Our main contributions
are: (i) hardware-aware heterogeneous differentiable quantization with
tensor-sliced learned precision, (ii) targeted gradient modification for wgts.
and acts. to mitigate quantization errors, and (iii) a multi-phase learning
schedule to address instability in learning arising from updates to the learned
quantizer and model parameters. We demonstrate the effectiveness of our
techniques on the ImageNet dataset across a range of models including
EfficientNet-Lite0 (e.g., 4.14MB of wgts. and acts. at 67.66% accuracy) and
MobileNetV2 (e.g., 3.51MB wgts. and acts. at 65.39% accuracy).
Related papers
- Low-bit Quantization for Deep Graph Neural Networks with
Smoothness-aware Message Propagation [3.9177379733188715]
We present an end-to-end solution that aims to address these challenges for efficient GNNs in resource constrained environments.
We introduce a quantization based approach for all stages of GNNs, from message passing in training to node classification.
The proposed quantizer learns quantization ranges and reduces the model size with comparable accuracy even under low-bit quantization.
arXiv Detail & Related papers (2023-08-29T00:25:02Z) - Low Precision Quantization-aware Training in Spiking Neural Networks
with Differentiable Quantization Function [0.5046831208137847]
This work aims to bridge the gap between recent progress in quantized neural networks and spiking neural networks.
It presents an extensive study on the performance of the quantization function, represented as a linear combination of sigmoid functions.
The presented quantization function demonstrates the state-of-the-art performance on four popular benchmarks.
arXiv Detail & Related papers (2023-05-30T09:42:05Z) - A Comprehensive Survey on Model Quantization for Deep Neural Networks in
Image Classification [0.0]
A promising approach is quantization, in which the full-precision values are stored in low bit-width precision.
We present a comprehensive survey of quantization concepts and methods, with a focus on image classification.
We explain the replacement of floating-point operations with low-cost bitwise operations in a quantized DNN and the sensitivity of different layers in quantization.
arXiv Detail & Related papers (2022-05-14T15:08:32Z) - Mixed Precision Low-bit Quantization of Neural Network Language Models
for Speech Recognition [67.95996816744251]
State-of-the-art language models (LMs) represented by long-short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming increasingly complex and expensive for practical applications.
Current quantization methods are based on uniform precision and fail to account for the varying performance sensitivity at different parts of LMs to quantization errors.
Novel mixed precision neural network LM quantization methods are proposed in this paper.
arXiv Detail & Related papers (2021-11-29T12:24:02Z) - Low-bit Quantization of Recurrent Neural Network Language Models Using
Alternating Direction Methods of Multipliers [67.688697838109]
This paper presents a novel method to train quantized RNNLMs from scratch using alternating direction methods of multipliers (ADMM)
Experiments on two tasks suggest the proposed ADMM quantization achieved a model size compression factor of up to 31 times over the full precision baseline RNNLMs.
arXiv Detail & Related papers (2021-11-29T09:30:06Z) - ECQ$^{\text{x}}$: Explainability-Driven Quantization for Low-Bit and
Sparse DNNs [13.446502051609036]
We develop and describe a novel quantization paradigm for deep neural networks (DNNs)
Our method leverages concepts of explainable AI (XAI) and concepts of information theory.
The ultimate goal is to preserve the most relevant weights in quantization clusters of highest information content.
arXiv Detail & Related papers (2021-09-09T12:57:06Z) - A High-Performance Adaptive Quantization Approach for Edge CNN
Applications [0.225596179391365]
Recent convolutional neural network (CNN) development continues to advance the state-of-the-art model accuracy for various applications.
The enhanced accuracy comes at the cost of substantial memory bandwidth and storage requirements.
In this paper, we introduce an adaptive high-performance quantization method to resolve the issue of biased activation.
arXiv Detail & Related papers (2021-07-18T07:49:18Z) - Low-Precision Training in Logarithmic Number System using Multiplicative
Weight Update [49.948082497688404]
Training large-scale deep neural networks (DNNs) currently requires a significant amount of energy, leading to serious environmental impacts.
One promising approach to reduce the energy costs is representing DNNs with low-precision numbers.
We jointly design a lowprecision training framework involving a logarithmic number system (LNS) and a multiplicative weight update training method, termed LNS-Madam.
arXiv Detail & Related papers (2021-06-26T00:32:17Z) - ActNN: Reducing Training Memory Footprint via 2-Bit Activation
Compressed Training [68.63354877166756]
ActNN is a memory-efficient training framework that stores randomly quantized activations for back propagation.
ActNN reduces the memory footprint of the activation by 12x, and it enables training with a 6.6x to 14x larger batch size.
arXiv Detail & Related papers (2021-04-29T05:50:54Z) - Direct Quantization for Training Highly Accurate Low Bit-width Deep
Neural Networks [73.29587731448345]
This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations.
First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights.
Second, to obtain low bit-width activations, existing works consider all channels equally.
arXiv Detail & Related papers (2020-12-26T15:21:18Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.