Efficient Integer-Arithmetic-Only Convolutional Neural Networks
- URL: http://arxiv.org/abs/2006.11735v1
- Date: Sun, 21 Jun 2020 08:23:03 GMT
- Title: Efficient Integer-Arithmetic-Only Convolutional Neural Networks
- Authors: Hengrui Zhao and Dong Liu and Houqiang Li
- Abstract summary: We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization.
Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
- Score: 87.01739569518513
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Integer-arithmetic-only networks have been demonstrated effective to reduce
computational cost and to ensure cross-platform consistency. However, previous
works usually report a decline in the inference accuracy when converting
well-trained floating-point-number (FPN) networks into integer networks. We
analyze this phonomenon and find that the decline is due to activation
quantization. Specifically, when we replace conventional ReLU with Bounded
ReLU, how to set the bound for each neuron is a key problem. Considering the
tradeoff between activation quantization error and network learning ability, we
set an empirical rule to tune the bound of each Bounded ReLU. We also design a
mechanism to handle the cases of feature map addition and feature map
concatenation. Based on the proposed method, our trained 8-bit integer ResNet
outperforms the 8-bit networks of Google's TensorFlow and NVIDIA's TensorRT for
image recognition. We also experiment on VDSR for image super-resolution and on
VRCNN for compression artifact reduction, both of which serve for regression
tasks that natively require high inference accuracy. Our integer networks
achieve equivalent performance as the corresponding FPN networks, but have only
1/4 memory cost and run 2x faster on modern GPUs. Our code and models can be
found at github.com/HengRuiZ/brelu.
Related papers
- NITRO-D: Native Integer-only Training of Deep Convolutional Neural Networks [2.6230959823681834]
This work introduces NITRO-D, a new framework for training arbitrarily deep integer-only Convolutional Neural Networks (CNNs)
NiTRO-D is the first framework in the literature enabling the training of integer-only CNNs without the need to introduce a quantization scheme.
arXiv Detail & Related papers (2024-07-16T13:16:49Z) - RedBit: An End-to-End Flexible Framework for Evaluating the Accuracy of
Quantized CNNs [9.807687918954763]
Convolutional Neural Networks (CNNs) have become the standard class of deep neural network for image processing, classification and segmentation tasks.
RedBit is an open-source framework that provides a transparent, easy-to-use interface to evaluate the effectiveness of different algorithms on network accuracy.
arXiv Detail & Related papers (2023-01-15T21:27:35Z) - DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos [16.644938608211202]
Convolutional neural network inference on video data requires powerful hardware for real-time processing.
We present a sparse convolutional neural network framework that enables sparse frame-by-frame updates.
We are the first to significantly outperform the dense reference, cuDNN, in practical settings, achieving speedups of up to 7x with only marginal differences in accuracy.
arXiv Detail & Related papers (2022-03-08T10:54:00Z) - OMPQ: Orthogonal Mixed Precision Quantization [64.59700856607017]
Mixed precision quantization takes advantage of hardware's multiple bit-width arithmetic operations to unleash the full potential of network quantization.
We propose to optimize a proxy metric, the concept of networkity, which is highly correlated with the loss of the integer programming.
This approach reduces the search time and required data amount by orders of magnitude, with little compromise on quantization accuracy.
arXiv Detail & Related papers (2021-09-16T10:59:33Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - Integer-Only Neural Network Quantization Scheme Based on
Shift-Batch-Normalization [13.82935273026808]
In this paper, an integer-only-quantization scheme is introduced.
This scheme uses shift-based batch normalization and uniform quantization to implement 4-bit integer-only inference.
arXiv Detail & Related papers (2021-05-28T09:28:12Z) - Enabling certification of verification-agnostic networks via
memory-efficient semidefinite programming [97.40955121478716]
We propose a first-order dual SDP algorithm that requires memory only linear in the total number of network activations.
We significantly improve L-inf verified robust accuracy from 1% to 88% and 6% to 40% respectively.
We also demonstrate tight verification of a quadratic stability specification for the decoder of a variational autoencoder.
arXiv Detail & Related papers (2020-10-22T12:32:29Z) - NITI: Training Integer Neural Networks Using Integer-only Arithmetic [4.361357921751159]
We present NITI, an efficient deep neural network training framework that computes exclusively with integer arithmetic.
A proof-of-concept open-source software implementation of NITI that utilizes native 8-bit integer operations is presented.
NITI achieves negligible accuracy degradation on the MNIST and CIFAR10 datasets using 8-bit integer storage and computation.
arXiv Detail & Related papers (2020-09-28T07:41:36Z) - AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation.
Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.