Related papers: Efficient Integer-Arithmetic-Only Convolutional Neural Networks

Efficient Integer-Arithmetic-Only Convolutional Neural Networks

URL: http://arxiv.org/abs/2006.11735v1
Date: Sun, 21 Jun 2020 08:23:03 GMT
Title: Efficient Integer-Arithmetic-Only Convolutional Neural Networks
Authors: Hengrui Zhao and Dong Liu and Houqiang Li
Abstract summary: We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization. Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
Score: 87.01739569518513
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Integer-arithmetic-only networks have been demonstrated effective to reduce computational cost and to ensure cross-platform consistency. However, previous works usually report a decline in the inference accuracy when converting well-trained floating-point-number (FPN) networks into integer networks. We analyze this phonomenon and find that the decline is due to activation quantization. Specifically, when we replace conventional ReLU with Bounded ReLU, how to set the bound for each neuron is a key problem. Considering the tradeoff between activation quantization error and network learning ability, we set an empirical rule to tune the bound of each Bounded ReLU. We also design a mechanism to handle the cases of feature map addition and feature map concatenation. Based on the proposed method, our trained 8-bit integer ResNet outperforms the 8-bit networks of Google's TensorFlow and NVIDIA's TensorRT for image recognition. We also experiment on VDSR for image super-resolution and on VRCNN for compression artifact reduction, both of which serve for regression tasks that natively require high inference accuracy. Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPUs. Our code and models can be found at github.com/HengRuiZ/brelu.

Related papers

NITRO-D: Native Integer-only Training of Deep Convolutional Neural Networks [2.6230959823681834]
This work introduces NITRO-D, a new framework for training arbitrarily deep integer-only Convolutional Neural Networks (CNNs) NiTRO-D is the first framework in the literature enabling the training of integer-only CNNs without the need to introduce a quantization scheme.
arXiv Detail & Related papers (2024-07-16T13:16:49Z)
RedBit: An End-to-End Flexible Framework for Evaluating the Accuracy of Quantized CNNs [9.807687918954763]
Convolutional Neural Networks (CNNs) have become the standard class of deep neural network for image processing, classification and segmentation tasks. RedBit is an open-source framework that provides a transparent, easy-to-use interface to evaluate the effectiveness of different algorithms on network accuracy.
arXiv Detail & Related papers (2023-01-15T21:27:35Z)
DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos [16.644938608211202]
Convolutional neural network inference on video data requires powerful hardware for real-time processing. We present a sparse convolutional neural network framework that enables sparse frame-by-frame updates. We are the first to significantly outperform the dense reference, cuDNN, in practical settings, achieving speedups of up to 7x with only marginal differences in accuracy.
arXiv Detail & Related papers (2022-03-08T10:54:00Z)
OMPQ: Orthogonal Mixed Precision Quantization [64.59700856607017]
Mixed precision quantization takes advantage of hardware's multiple bit-width arithmetic operations to unleash the full potential of network quantization. We propose to optimize a proxy metric, the concept of networkity, which is highly correlated with the loss of the integer programming. This approach reduces the search time and required data amount by orders of magnitude, with little compromise on quantization accuracy.
arXiv Detail & Related papers (2021-09-16T10:59:33Z)
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks. We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)
Integer-Only Neural Network Quantization Scheme Based on Shift-Batch-Normalization [13.82935273026808]
In this paper, an integer-only-quantization scheme is introduced. This scheme uses shift-based batch normalization and uniform quantization to implement 4-bit integer-only inference.
arXiv Detail & Related papers (2021-05-28T09:28:12Z)
Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming [97.40955121478716]
We propose a first-order dual SDP algorithm that requires memory only linear in the total number of network activations. We significantly improve L-inf verified robust accuracy from 1% to 88% and 6% to 40% respectively. We also demonstrate tight verification of a quadratic stability specification for the decoder of a variational autoencoder.
arXiv Detail & Related papers (2020-10-22T12:32:29Z)
NITI: Training Integer Neural Networks Using Integer-only Arithmetic [4.361357921751159]
We present NITI, an efficient deep neural network training framework that computes exclusively with integer arithmetic. A proof-of-concept open-source software implementation of NITI that utilizes native 8-bit integer operations is presented. NITI achieves negligible accuracy degradation on the MNIST and CIFAR10 datasets using 8-bit integer storage and computation.
arXiv Detail & Related papers (2020-09-28T07:41:36Z)
AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation. Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z)
Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs. Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.