CoNLoCNN: Exploiting Correlation and Non-Uniform Quantization for
Energy-Efficient Low-precision Deep Convolutional Neural Networks
- URL: http://arxiv.org/abs/2208.00331v1
- Date: Sun, 31 Jul 2022 01:34:56 GMT
- Title: CoNLoCNN: Exploiting Correlation and Non-Uniform Quantization for
Energy-Efficient Low-precision Deep Convolutional Neural Networks
- Authors: Muhammad Abdullah Hanif, Giuseppe Maria Sarda, Alberto Marchisio,
Guido Masera, Maurizio Martina, Muhammad Shafique
- Abstract summary: We propose a framework to enable energy-efficient low-precision deep convolutional neural network inference by exploiting non-uniform quantization of weights.
We also propose a novel data representation format, Encoded Low-Precision Binary Signed Digit, to compress the bit-width of weights.
- Score: 13.520972975766313
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In today's era of smart cyber-physical systems, Deep Neural Networks (DNNs)
have become ubiquitous due to their state-of-the-art performance in complex
real-world applications. The high computational complexity of these networks,
which translates to increased energy consumption, is the foremost obstacle
towards deploying large DNNs in resource-constrained systems. Fixed-Point (FP)
implementations achieved through post-training quantization are commonly used
to curtail the energy consumption of these networks. However, the uniform
quantization intervals in FP restrict the bit-width of data structures to large
values due to the need to represent most of the numbers with sufficient
resolution and avoid high quantization errors. In this paper, we leverage the
key insight that (in most of the scenarios) DNN weights and activations are
mostly concentrated near zero and only a few of them have large magnitudes. We
propose CoNLoCNN, a framework to enable energy-efficient low-precision deep
convolutional neural network inference by exploiting: (1) non-uniform
quantization of weights enabling simplification of complex multiplication
operations; and (2) correlation between activation values enabling partial
compensation of quantization errors at low cost without any run-time overheads.
To significantly benefit from non-uniform quantization, we also propose a novel
data representation format, Encoded Low-Precision Binary Signed Digit, to
compress the bit-width of weights while ensuring direct use of the encoded
weight for processing using a novel multiply-and-accumulate (MAC) unit design.
Related papers
- Accelerating Error Correction Code Transformers [56.75773430667148]
We introduce a novel acceleration method for transformer-based decoders.
We achieve a 90% compression ratio and reduce arithmetic operation energy consumption by at least 224 times on modern hardware.
arXiv Detail & Related papers (2024-10-08T11:07:55Z) - Constraint Guided Model Quantization of Neural Networks [0.0]
Constraint Guided Model Quantization (CGMQ) is a quantization aware training algorithm that uses an upper bound on the computational resources and reduces the bit-widths of the parameters of the neural network.
It is shown on MNIST that the performance of CGMQ is competitive with state-of-the-art quantization aware training algorithms.
arXiv Detail & Related papers (2024-09-30T09:41:16Z) - Energy Efficient Hardware Acceleration of Neural Networks with
Power-of-Two Quantisation [0.0]
We show that a hardware neural network accelerator with PoT weights implemented on the Zynq UltraScale + MPSoC ZCU104 FPGA can be at least $1.4x$ more energy efficient than the uniform quantisation version.
arXiv Detail & Related papers (2022-09-30T06:33:40Z) - BiTAT: Neural Network Binarization with Task-dependent Aggregated
Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation.
Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration.
This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z) - Power-of-Two Quantization for Low Bitwidth and Hardware Compliant Neural
Networks [1.398698203665363]
In this paper, we explore non-linear quantization techniques for exploiting lower bit precision.
We developed the Quantization Aware Training (QAT) algorithm that allowed training of low bit width Power-of-Two (PoT) networks.
At the same time, PoT quantization vastly reduces the computational complexity of the neural network.
arXiv Detail & Related papers (2022-03-09T19:57:14Z) - On the Acceleration of Deep Neural Network Inference using Quantized
Compressed Sensing [0.0]
We propose a novel binary quantization function based on quantized compressed sensing (QCS)
Our proposal preserves the practical benefits of standard methods, while reducing the quantization error and the resulting drop in accuracy.
arXiv Detail & Related papers (2021-08-23T12:03:24Z) - Low-Precision Training in Logarithmic Number System using Multiplicative
Weight Update [49.948082497688404]
Training large-scale deep neural networks (DNNs) currently requires a significant amount of energy, leading to serious environmental impacts.
One promising approach to reduce the energy costs is representing DNNs with low-precision numbers.
We jointly design a lowprecision training framework involving a logarithmic number system (LNS) and a multiplicative weight update training method, termed LNS-Madam.
arXiv Detail & Related papers (2021-06-26T00:32:17Z) - PAMS: Quantized Super-Resolution via Parameterized Max Scale [84.55675222525608]
Deep convolutional neural networks (DCNNs) have shown dominant performance in the task of super-resolution (SR)
We propose a new quantization scheme termed PArameterized Max Scale (PAMS), which applies the trainable truncated parameter to explore the upper bound of the quantization range adaptively.
Experiments demonstrate that the proposed PAMS scheme can well compress and accelerate the existing SR models such as EDSR and RDN.
arXiv Detail & Related papers (2020-11-09T06:16:05Z) - ShiftAddNet: A Hardware-Inspired Deep Network [87.18216601210763]
ShiftAddNet is an energy-efficient multiplication-less deep neural network.
It leads to both energy-efficient inference and training, without compromising expressive capacity.
ShiftAddNet aggressively reduces over 80% hardware-quantified energy cost of DNNs training and inference, while offering comparable or better accuracies.
arXiv Detail & Related papers (2020-10-24T05:09:14Z) - AUSN: Approximately Uniform Quantization by Adaptively Superimposing
Non-uniform Distribution for Deep Neural Networks [0.7378164273177589]
Existing uniform and non-uniform quantization methods exhibit an inherent conflict between the representing range and representing resolution.
We propose a novel quantization method to quantize the weight and activation.
The key idea is to Approximate the Uniform quantization by Adaptively Superposing multiple Non-uniform quantized values, namely AUSN.
arXiv Detail & Related papers (2020-07-08T05:10:53Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.