Logic Design of Neural Networks for High-Throughput and Low-Power
Applications
- URL: http://arxiv.org/abs/2309.10510v1
- Date: Tue, 19 Sep 2023 10:45:46 GMT
- Title: Logic Design of Neural Networks for High-Throughput and Low-Power
Applications
- Authors: Kangwei Xu, Grace Li Zhang, Ulf Schlichtmann, Bing Li
- Abstract summary: We propose to flatten and implement all the operations at neurons, e.g., MAC and ReLU, in a neural network with their corresponding logic circuits.
The weight values are embedded into the MAC units to simplify the logic, which can reduce the delay of the MAC units and the power consumption incurred by weight movement.
In addition, we propose a hardware-aware training method to reduce the area of logic designs of neural networks.
- Score: 4.964773661192363
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Neural networks (NNs) have been successfully deployed in various fields. In
NNs, a large number of multiplyaccumulate (MAC) operations need to be
performed. Most existing digital hardware platforms rely on parallel MAC units
to accelerate these MAC operations. However, under a given area constraint, the
number of MAC units in such platforms is limited, so MAC units have to be
reused to perform MAC operations in a neural network. Accordingly, the
throughput in generating classification results is not high, which prevents the
application of traditional hardware platforms in extreme-throughput scenarios.
Besides, the power consumption of such platforms is also high, mainly due to
data movement. To overcome this challenge, in this paper, we propose to flatten
and implement all the operations at neurons, e.g., MAC and ReLU, in a neural
network with their corresponding logic circuits. To improve the throughput and
reduce the power consumption of such logic designs, the weight values are
embedded into the MAC units to simplify the logic, which can reduce the delay
of the MAC units and the power consumption incurred by weight movement. The
retiming technique is further used to improve the throughput of the logic
circuits for neural networks. In addition, we propose a hardware-aware training
method to reduce the area of logic designs of neural networks. Experimental
results demonstrate that the proposed logic designs can achieve high throughput
and low power consumption for several high-throughput applications.
Related papers
- Energy-Aware FPGA Implementation of Spiking Neural Network with LIF Neurons [0.5243460995467893]
Spiking Neural Networks (SNNs) stand out as a cutting-edge solution for TinyML.
This paper presents a novel SNN architecture based on the 1st Order Leaky Integrate-and-Fire (LIF) neuron model.
A hardware-friendly LIF design is also proposed, and implemented on a Xilinx Artix-7 FPGA.
arXiv Detail & Related papers (2024-11-03T16:42:10Z) - Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges.
We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs.
This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z) - EncodingNet: A Novel Encoding-based MAC Design for Efficient Neural Network Acceleration [7.694043781601237]
We propose a novel digital multiply-accumulate (MAC) design based on encoding.
In this new design, the multipliers are replaced by simple logic gates to represent the results with a wide bit representation.
Since the multiplication function is replaced by a simple logic representation, the critical paths in the resulting circuits become much shorter.
arXiv Detail & Related papers (2024-02-25T09:35:30Z) - NEON: Enabling Efficient Support for Nonlinear Operations in Resistive
RAM-based Neural Network Accelerators [12.045126404373868]
Resistive Random-Access Memory (RRAM) is well-suited to accelerate neural network (NN) workloads.
NEON is a novel compiler optimization to enable the end-to-end execution of the NN workload in RRAM.
arXiv Detail & Related papers (2022-11-10T17:57:35Z) - Signal Detection in MIMO Systems with Hardware Imperfections: Message
Passing on Neural Networks [101.59367762974371]
In this paper, we investigate signal detection in multiple-input-multiple-output (MIMO) communication systems with hardware impairments.
It is difficult to train a deep neural network (DNN) with limited pilot signals, hindering its practical applications.
We design an efficient message passing based Bayesian signal detector, leveraging the unitary approximate message passing (UAMP) algorithm.
arXiv Detail & Related papers (2022-10-08T04:32:58Z) - Energy Efficient Hardware Acceleration of Neural Networks with
Power-of-Two Quantisation [0.0]
We show that a hardware neural network accelerator with PoT weights implemented on the Zynq UltraScale + MPSoC ZCU104 FPGA can be at least $1.4x$ more energy efficient than the uniform quantisation version.
arXiv Detail & Related papers (2022-09-30T06:33:40Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - Ps and Qs: Quantization-aware pruning for efficient low latency neural
network inference [56.24109486973292]
We study the interplay between pruning and quantization during the training of neural networks for ultra low latency applications.
We find that quantization-aware pruning yields more computationally efficient models than either pruning or quantization alone for our task.
arXiv Detail & Related papers (2021-02-22T19:00:05Z) - AM-DCGAN: Analog Memristive Hardware Accelerator for Deep Convolutional
Generative Adversarial Networks [3.4806267677524896]
We present a fully analog hardware design of Deep Convolutional GAN (DCGAN) based on CMOS-memristive convolutional and deconvolutional networks simulated using 180nm CMOS technology.
arXiv Detail & Related papers (2020-06-20T15:37:29Z) - Training End-to-End Analog Neural Networks with Equilibrium Propagation [64.0476282000118]
We introduce a principled method to train end-to-end analog neural networks by gradient descent.
We show mathematically that a class of analog neural networks (called nonlinear resistive networks) are energy-based models.
Our work can guide the development of a new generation of ultra-fast, compact and low-power neural networks supporting on-chip learning.
arXiv Detail & Related papers (2020-06-02T23:38:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.