Fixed-point Quantization of Convolutional Neural Networks for Quantized
Inference on Embedded Platforms
- URL: http://arxiv.org/abs/2102.02147v1
- Date: Wed, 3 Feb 2021 17:05:55 GMT
- Title: Fixed-point Quantization of Convolutional Neural Networks for Quantized
Inference on Embedded Platforms
- Authors: Rishabh Goyal, Joaquin Vanschoren, Victor van Acht, Stephan Nijssen
- Abstract summary: We propose a method to optimally quantize the weights, biases and activations of each layer of a pre-trained CNN.
We find that layer-wise quantization of parameters significantly helps in this process.
- Score: 0.9954382983583577
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Convolutional Neural Networks (CNNs) have proven to be a powerful
state-of-the-art method for image classification tasks. One drawback however is
the high computational complexity and high memory consumption of CNNs which
makes them unfeasible for execution on embedded platforms which are constrained
on physical resources needed to support CNNs. Quantization has often been used
to efficiently optimize CNNs for memory and computational complexity at the
cost of a loss of prediction accuracy. We therefore propose a method to
optimally quantize the weights, biases and activations of each layer of a
pre-trained CNN while controlling the loss in inference accuracy to enable
quantized inference. We quantize the 32-bit floating-point precision parameters
to low bitwidth fixed-point representations thereby finding optimal bitwidths
and fractional offsets for parameters of each layer of a given CNN. We quantize
parameters of a CNN post-training without re-training it. Our method is
designed to quantize parameters of a CNN taking into account how other
parameters are quantized because ignoring quantization errors due to other
quantized parameters leads to a low precision CNN with accuracy losses of up to
50% which is far beyond what is acceptable. Our final method therefore gives a
low precision CNN with accuracy losses of less than 1%. As compared to a method
used by commercial tools that quantize all parameters to 8-bits, our approach
provides quantized CNN with averages of 53% lower memory consumption and 77.5%
lower cost of executing multiplications for the two CNNs trained on the four
datasets that we tested our work on. We find that layer-wise quantization of
parameters significantly helps in this process.
Related papers
- OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation [70.17681136234202]
We reexamine the design distinctions and test the limits of what a sparse CNN can achieve.
We propose two key components, i.e., adaptive receptive fields (spatially) and adaptive relation, to bridge the gap.
This exploration led to the creation of Omni-Adaptive 3D CNNs (OA-CNNs), a family of networks that integrates a lightweight module.
arXiv Detail & Related papers (2024-03-21T14:06:38Z) - GHN-QAT: Training Graph Hypernetworks to Predict Quantization-Robust
Parameters of Unseen Limited Precision Neural Networks [80.29667394618625]
Graph Hypernetworks (GHN) can predict the parameters of varying unseen CNN architectures with surprisingly good accuracy.
Preliminary research has explored the use of GHNs to predict quantization-robust parameters for 8-bit and 4-bit quantized CNNs.
We show that quantization-aware training can significantly improve quantized accuracy for GHN predicted parameters of 4-bit quantized CNNs.
arXiv Detail & Related papers (2023-09-24T23:01:00Z) - A Proximal Algorithm for Network Slimming [2.8148957592979427]
A popular channel pruning method for convolutional neural networks (CNNs) uses subgradient descent to train CNNs.
We develop an alternative algorithm called proximal NS to train CNNs towards sparse, accurate structures.
Our experiments demonstrate that after one round of training, proximal NS yields a CNN with competitive accuracy and compression.
arXiv Detail & Related papers (2023-07-02T23:34:12Z) - Compressing audio CNNs with graph centrality based filter pruning [20.028643659869573]
Convolutional neural networks (CNNs) are commonplace in high-performing solutions to many real-world problems.
CNNs have many parameters and filters, with some having a larger impact on the performance than others.
We propose a pruning framework that eliminates filters with the highest "commonality"
arXiv Detail & Related papers (2023-05-05T09:38:05Z) - RedBit: An End-to-End Flexible Framework for Evaluating the Accuracy of
Quantized CNNs [9.807687918954763]
Convolutional Neural Networks (CNNs) have become the standard class of deep neural network for image processing, classification and segmentation tasks.
RedBit is an open-source framework that provides a transparent, easy-to-use interface to evaluate the effectiveness of different algorithms on network accuracy.
arXiv Detail & Related papers (2023-01-15T21:27:35Z) - Attention-based Feature Compression for CNN Inference Offloading in Edge
Computing [93.67044879636093]
This paper studies the computational offloading of CNN inference in device-edge co-inference systems.
We propose a novel autoencoder-based CNN architecture (AECNN) for effective feature extraction at end-device.
Experiments show that AECNN can compress the intermediate data by more than 256x with only about 4% accuracy loss.
arXiv Detail & Related papers (2022-11-24T18:10:01Z) - GHN-Q: Parameter Prediction for Unseen Quantized Convolutional
Architectures via Graph Hypernetworks [80.29667394618625]
We conduct the first-ever study exploring the use of graph hypernetworks for predicting parameters of unseen quantized CNN architectures.
We focus on a reduced CNN search space and find that GHN-Q can in fact predict quantization-robust parameters for various 8-bit quantized CNNs.
arXiv Detail & Related papers (2022-08-26T08:00:02Z) - ACP: Automatic Channel Pruning via Clustering and Swarm Intelligence
Optimization for CNN [6.662639002101124]
convolutional neural network (CNN) gets deeper and wider in recent years.
Existing magnitude-based pruning methods are efficient, but the performance of the compressed network is unpredictable.
We propose a novel automatic channel pruning method (ACP)
ACP is evaluated against several state-of-the-art CNNs on three different classification datasets.
arXiv Detail & Related papers (2021-01-16T08:56:38Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z) - Approximation and Non-parametric Estimation of ResNet-type Convolutional
Neural Networks [52.972605601174955]
We show a ResNet-type CNN can attain the minimax optimal error rates in important function classes.
We derive approximation and estimation error rates of the aformentioned type of CNNs for the Barron and H"older classes.
arXiv Detail & Related papers (2019-03-24T19:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.