Energy Efficient Hardware Acceleration of Neural Networks with
Power-of-Two Quantisation
- URL: http://arxiv.org/abs/2209.15257v1
- Date: Fri, 30 Sep 2022 06:33:40 GMT
- Title: Energy Efficient Hardware Acceleration of Neural Networks with
Power-of-Two Quantisation
- Authors: Dominika Przewlocka-Rus, Tomasz Kryjak
- Abstract summary: We show that a hardware neural network accelerator with PoT weights implemented on the Zynq UltraScale + MPSoC ZCU104 FPGA can be at least $1.4x$ more energy efficient than the uniform quantisation version.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks virtually dominate the domain of most modern vision
systems, providing high performance at a cost of increased computational
complexity.Since for those systems it is often required to operate both in
real-time and with minimal energy consumption (e.g., for wearable devices or
autonomous vehicles, edge Internet of Things (IoT), sensor networks), various
network optimisation techniques are used, e.g., quantisation, pruning, or
dedicated lightweight architectures. Due to the logarithmic distribution of
weights in neural network layers, a method providing high performance with
significant reduction in computational precision (for 4-bit weights and less)
is the Power-of-Two (PoT) quantisation (and therefore also with a logarithmic
distribution). This method introduces additional possibilities of replacing the
typical for neural networks Multiply and ACcumulate (MAC -- performing, e.g.,
convolution operations) units, with more energy-efficient Bitshift and
ACcumulate (BAC). In this paper, we show that a hardware neural network
accelerator with PoT weights implemented on the Zynq UltraScale + MPSoC ZCU104
SoC FPGA can be at least $1.4x$ more energy efficient than the uniform
quantisation version. To further reduce the actual power requirement by
omitting part of the computation for zero weights, we also propose a new
pruning method adapted to logarithmic quantisation.
Related papers
- Quantization of Deep Neural Networks to facilitate self-correction of
weights on Phase Change Memory-based analog hardware [0.0]
We develop an algorithm to approximate a set of multiplicative weights.
These weights aim to represent the original network's weights with minimal loss in performance.
Our results demonstrate that, when paired with an on-chip pulse generator, our self-correcting neural network performs comparably to those trained with analog-aware algorithms.
arXiv Detail & Related papers (2023-09-30T10:47:25Z) - The Hardware Impact of Quantization and Pruning for Weights in Spiking
Neural Networks [0.368986335765876]
quantization and pruning of parameters can both compress the model size, reduce memory footprints, and facilitate low-latency execution.
We study various combinations of pruning and quantization in isolation, cumulatively, and simultaneously to a state-of-the-art SNN targeting gesture recognition.
We show that this state-of-the-art model is amenable to aggressive parameter quantization, not suffering from any loss in accuracy down to ternary weights.
arXiv Detail & Related papers (2023-02-08T16:25:20Z) - Vertical Layering of Quantized Neural Networks for Heterogeneous
Inference [57.42762335081385]
We study a new vertical-layered representation of neural network weights for encapsulating all quantized models into a single one.
We can theoretically achieve any precision network for on-demand service while only needing to train and maintain one model.
arXiv Detail & Related papers (2022-12-10T15:57:38Z) - CoNLoCNN: Exploiting Correlation and Non-Uniform Quantization for
Energy-Efficient Low-precision Deep Convolutional Neural Networks [13.520972975766313]
We propose a framework to enable energy-efficient low-precision deep convolutional neural network inference by exploiting non-uniform quantization of weights.
We also propose a novel data representation format, Encoded Low-Precision Binary Signed Digit, to compress the bit-width of weights.
arXiv Detail & Related papers (2022-07-31T01:34:56Z) - Power-of-Two Quantization for Low Bitwidth and Hardware Compliant Neural
Networks [1.398698203665363]
In this paper, we explore non-linear quantization techniques for exploiting lower bit precision.
We developed the Quantization Aware Training (QAT) algorithm that allowed training of low bit width Power-of-Two (PoT) networks.
At the same time, PoT quantization vastly reduces the computational complexity of the neural network.
arXiv Detail & Related papers (2022-03-09T19:57:14Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - Learning Frequency-aware Dynamic Network for Efficient Super-Resolution [56.98668484450857]
This paper explores a novel frequency-aware dynamic network for dividing the input into multiple parts according to its coefficients in the discrete cosine transform (DCT) domain.
In practice, the high-frequency part will be processed using expensive operations and the lower-frequency part is assigned with cheap operations to relieve the computation burden.
Experiments conducted on benchmark SISR models and datasets show that the frequency-aware dynamic network can be employed for various SISR neural architectures.
arXiv Detail & Related papers (2021-03-15T12:54:26Z) - Ps and Qs: Quantization-aware pruning for efficient low latency neural
network inference [56.24109486973292]
We study the interplay between pruning and quantization during the training of neural networks for ultra low latency applications.
We find that quantization-aware pruning yields more computationally efficient models than either pruning or quantization alone for our task.
arXiv Detail & Related papers (2021-02-22T19:00:05Z) - AdderNet and its Minimalist Hardware Design for Energy-Efficient
Artificial Intelligence [111.09105910265154]
We present a novel minimalist hardware architecture using adder convolutional neural network (AdderNet)
The whole AdderNet can practically achieve 16% enhancement in speed.
We conclude the AdderNet is able to surpass all the other competitors.
arXiv Detail & Related papers (2021-01-25T11:31:52Z) - ShiftAddNet: A Hardware-Inspired Deep Network [87.18216601210763]
ShiftAddNet is an energy-efficient multiplication-less deep neural network.
It leads to both energy-efficient inference and training, without compromising expressive capacity.
ShiftAddNet aggressively reduces over 80% hardware-quantified energy cost of DNNs training and inference, while offering comparable or better accuracies.
arXiv Detail & Related papers (2020-10-24T05:09:14Z) - A Spike in Performance: Training Hybrid-Spiking Neural Networks with
Quantized Activation Functions [6.574517227976925]
Spiking Neural Network (SNN) is a promising approach to energy-efficient computing.
We show how to maintain state-of-the-art accuracy when converting a non-spiking network into an SNN.
arXiv Detail & Related papers (2020-02-10T05:24:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.