Related papers: EncodingNet: A Novel Encoding-based MAC Design for Efficient Neural Network Acceleration

EncodingNet: A Novel Encoding-based MAC Design for Efficient Neural Network Acceleration

URL: http://arxiv.org/abs/2402.18595v1
Date: Sun, 25 Feb 2024 09:35:30 GMT
Title: EncodingNet: A Novel Encoding-based MAC Design for Efficient Neural Network Acceleration
Authors: Bo Liu, Grace Li Zhang, Xunzhao Yin, Ulf Schlichtmann, Bing Li
Abstract summary: We propose a novel digital multiply-accumulate (MAC) design based on encoding. In this new design, the multipliers are replaced by simple logic gates to project the results onto a wide bit representation. The experimental results confirmed the reduction of circuit area by up to 79.63% and the reduction of power consumption of executing DNNs by up to 70.18%.
Score: 8.254523741863135
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Deep neural networks (DNNs) have achieved great breakthroughs in many fields such as image classification and natural language processing. However, the execution of DNNs needs to conduct massive numbers of multiply-accumulate (MAC) operations on hardware and thus incurs a large power consumption. To address this challenge, we propose a novel digital MAC design based on encoding. In this new design, the multipliers are replaced by simple logic gates to project the results onto a wide bit representation. These bits carry individual position weights, which can be trained for specific neural networks to enhance inference accuracy. The outputs of the new multipliers are added by bit-wise weighted accumulation and the accumulation results are compatible with existing computing platforms accelerating neural networks with either uniform or non-uniform quantization. Since the multiplication function is replaced by simple logic projection, the critical paths in the resulting circuits become much shorter. Correspondingly, pipelining stages in the MAC array can be reduced, leading to a significantly smaller area as well as a better power efficiency. The proposed design has been synthesized and verified by ResNet18-Cifar10, ResNet20-Cifar100 and ResNet50-ImageNet. The experimental results confirmed the reduction of circuit area by up to 79.63% and the reduction of power consumption of executing DNNs by up to 70.18%, while the accuracy of the neural networks can still be well maintained.

Related papers

Quality Scalable Quantization Methodology for Deep Learning on Edge [0.20718016474717196]
Deep Learning Architectures employ heavy computations and bulk of the computational energy is taken up by the convolution operations in the Convolutional Neural Networks. The proposed work is to reduce the energy consumption and size of CNN for using machine learning techniques in edge computing on ubiquitous computing devices. The experiments done on LeNet and ConvNets show an increase upto 6% of zeros and memory savings upto 82.4919% while keeping the accuracy near the state of the art.
arXiv Detail & Related papers (2024-07-15T22:00:29Z)
Logic Design of Neural Networks for High-Throughput and Low-Power Applications [4.964773661192363]
We propose to flatten and implement all the operations at neurons, e.g., MAC and ReLU, in a neural network with their corresponding logic circuits. The weight values are embedded into the MAC units to simplify the logic, which can reduce the delay of the MAC units and the power consumption incurred by weight movement. In addition, we propose a hardware-aware training method to reduce the area of logic designs of neural networks.
arXiv Detail & Related papers (2023-09-19T10:45:46Z)
SteppingNet: A Stepping Neural Network with Incremental Accuracy Enhancement [10.20763050412309]
Increasing number of multiply-and-accumulate (MAC) operations prevents their application in resource-constrained platforms. We propose a design framework called SteppingNet to address these challenges. We show SteppingNet provides an effective incremental accuracy improvement and its inference accuracy consistently outperforms state-of-the-art work.
arXiv Detail & Related papers (2022-11-27T20:20:33Z)
Low-bit Shift Network for End-to-End Spoken Language Understanding [7.851607739211987]
We propose the use of power-of-two quantization, which quantizes continuous parameters into low-bit power-of-two values. This reduces computational complexity by removing expensive multiplication operations and with the use of low-bit weights.
arXiv Detail & Related papers (2022-07-15T14:34:22Z)
Saving RNN Computations with a Neuron-Level Fuzzy Memoization Scheme [0.0]
Recurrent Neural Networks (RNNs) are a key technology for applications such as automatic speech recognition or machine translation. We build a neuron-level fuzzy memoization scheme, which dynamically caches each neuron's output and reuses it whenever it is predicted that the current output will be similar to a previously computed result. We show that our technique avoids more than 26.7% of computations, resulting in 21% energy savings and 1.4x speedup on average.
arXiv Detail & Related papers (2022-02-14T09:02:03Z)
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks. We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)
ShiftAddNet: A Hardware-Inspired Deep Network [87.18216601210763]
ShiftAddNet is an energy-efficient multiplication-less deep neural network. It leads to both energy-efficient inference and training, without compromising expressive capacity. ShiftAddNet aggressively reduces over 80% hardware-quantified energy cost of DNNs training and inference, while offering comparable or better accuracies.
arXiv Detail & Related papers (2020-10-24T05:09:14Z)
Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization. Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z)
Computational optimization of convolutional neural networks using separated filters architecture [69.73393478582027]
We consider a convolutional neural network transformation that reduces computation complexity and thus speedups neural network processing. Use of convolutional neural networks (CNN) is the standard approach to image recognition despite the fact they can be too computationally demanding.
arXiv Detail & Related papers (2020-02-18T17:42:13Z)
Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters. Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques. We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
AdderNet: Do We Really Need Multiplications in Deep Learning? [159.174891462064]
We present adder networks (AdderNets) to trade massive multiplications in deep neural networks for much cheaper additions to reduce computation costs. We develop a special back-propagation approach for AdderNets by investigating the full-precision gradient. As a result, the proposed AdderNets can achieve 74.9% Top-1 accuracy 91.7% Top-5 accuracy using ResNet-50 on the ImageNet dataset.
arXiv Detail & Related papers (2019-12-31T06:56:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.