Integer-Only Neural Network Quantization Scheme Based on
Shift-Batch-Normalization
- URL: http://arxiv.org/abs/2106.00127v1
- Date: Fri, 28 May 2021 09:28:12 GMT
- Title: Integer-Only Neural Network Quantization Scheme Based on
Shift-Batch-Normalization
- Authors: Qingyu Guo, Yuan Wang, Xiaoxin Cui
- Abstract summary: In this paper, an integer-only-quantization scheme is introduced.
This scheme uses shift-based batch normalization and uniform quantization to implement 4-bit integer-only inference.
- Score: 13.82935273026808
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Neural networks are very popular in many areas, but great computing
complexity makes it hard to run neural networks on devices with limited
resources. To address this problem, quantization methods are used to reduce
model size and computation cost, making it possible to use neural networks on
embedded platforms or mobile devices.
In this paper, an integer-only-quantization scheme is introduced. This scheme
uses one layer that combines shift-based batch normalization and uniform
quantization to implement 4-bit integer-only inference. Without big integer
multiplication(which is used in previous integer-only-quantization methods),
this scheme can achieve good power and latency efficiency, and is especially
suitable to be deployed on co-designed hardware platforms. Tests have proved
that this scheme works very well for easy tasks. And for tough tasks,
performance loss can be tolerated for its inference efficiency. Our work is
available on github: https://github.com/hguq/IntegerNet.
Related papers
- NITRO-D: Native Integer-only Training of Deep Convolutional Neural Networks [2.6230959823681834]
This work introduces NITRO-D, a new framework for training arbitrarily deep integer-only Convolutional Neural Networks (CNNs)
NiTRO-D is the first framework in the literature enabling the training of integer-only CNNs without the need to introduce a quantization scheme.
arXiv Detail & Related papers (2024-07-16T13:16:49Z) - Training Integer-Only Deep Recurrent Neural Networks [3.1829446824051195]
We present a quantization-aware training method for obtaining a highly accurate integer-only recurrent neural network (iRNN)
Our approach supports layer normalization, attention, and an adaptive piecewise linear (PWL) approximation of activation functions.
The proposed method enables RNN-based language models to run on edge devices with $2times$ improvement in runtime.
arXiv Detail & Related papers (2022-12-22T15:22:36Z) - OMPQ: Orthogonal Mixed Precision Quantization [64.59700856607017]
Mixed precision quantization takes advantage of hardware's multiple bit-width arithmetic operations to unleash the full potential of network quantization.
We propose to optimize a proxy metric, the concept of networkity, which is highly correlated with the loss of the integer programming.
This approach reduces the search time and required data amount by orders of magnitude, with little compromise on quantization accuracy.
arXiv Detail & Related papers (2021-09-16T10:59:33Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - A Survey of Quantization Methods for Efficient Neural Network Inference [75.55159744950859]
quantization is the problem of distributing continuous real-valued numbers over a fixed discrete set of numbers to minimize the number of bits required.
It has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas.
Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x.
arXiv Detail & Related papers (2021-03-25T06:57:11Z) - On the quantization of recurrent neural networks [9.549757800469196]
quantization of neural networks can be defined as the approximation of the high precision computation of the canonical neural network formulation.
We present an integer-only quantization strategy for Long Short-Term Memory (LSTM) neural network topologies.
arXiv Detail & Related papers (2021-01-14T04:25:08Z) - ShiftAddNet: A Hardware-Inspired Deep Network [87.18216601210763]
ShiftAddNet is an energy-efficient multiplication-less deep neural network.
It leads to both energy-efficient inference and training, without compromising expressive capacity.
ShiftAddNet aggressively reduces over 80% hardware-quantified energy cost of DNNs training and inference, while offering comparable or better accuracies.
arXiv Detail & Related papers (2020-10-24T05:09:14Z) - AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation.
Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z) - Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization.
Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.