DSLOT-NN: Digit-Serial Left-to-Right Neural Network Accelerator
- URL: http://arxiv.org/abs/2309.06019v2
- Date: Fri, 22 Sep 2023 02:44:28 GMT
- Title: DSLOT-NN: Digit-Serial Left-to-Right Neural Network Accelerator
- Authors: Muhammad Sohail Ibrahim, Muhammad Usman, Malik Zohaib Nisar, Jeong-A
Lee
- Abstract summary: We propose a Digit-Serial Left-tO-righT arithmetic based processing technique called DSLOT-NN.
The proposed work has the ability to assess and terminate the ineffective convolutions which results in massive power and energy savings.
- Score: 0.6435156676256051
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a Digit-Serial Left-tO-righT (DSLOT) arithmetic based processing
technique called DSLOT-NN with aim to accelerate inference of the convolution
operation in the deep neural networks (DNNs). The proposed work has the ability
to assess and terminate the ineffective convolutions which results in massive
power and energy savings. The processing engine is comprised of low-latency
most-significant-digit-first (MSDF) (also called online) multipliers and adders
that processes data from left-to-right, allowing the execution of subsequent
operations in digit-pipelined manner. Use of online operators eliminates the
need for the development of complex mechanism of identifying the negative
activation, as the output with highest weight value is generated first, and the
sign of the result can be identified as soon as first non-zero digit is
generated. The precision of the online operators can be tuned at run-time,
making them extremely useful in situations where accuracy can be compromised
for power and energy savings. The proposed design has been implemented on
Xilinx Virtex-7 FPGA and is compared with state-of-the-art Stripes on various
performance metrics. The results show the proposed design presents power
savings, has shorter cycle time, and approximately 50% higher OPS per watt.
Related papers
- DCP: Learning Accelerator Dataflow for Neural Network via Propagation [52.06154296196845]
This work proposes an efficient data-centric approach, named Dataflow Code Propagation (DCP), to automatically find the optimal dataflow for DNN layers in seconds without human effort.
DCP learns a neural predictor to efficiently update the dataflow codes towards the desired gradient directions to minimize various optimization objectives.
For example, without using additional training data, DCP surpasses the GAMMA method that performs a full search using thousands of samples.
arXiv Detail & Related papers (2024-10-09T05:16:44Z) - Low-Latency Online Multiplier with Reduced Activities and Minimized
Interconnect for Inner Product Arrays [0.8078491757252693]
This paper proposes a low latency multiplier based on online or left-to-right arithmetic.
Online arithmetic enables overlapping successive operations regardless of data dependency.
Serial nature of the online algorithm and gradual increment/decrement of active slices minimize the interconnects and signal activities.
arXiv Detail & Related papers (2023-04-06T01:22:27Z) - Online Transformers with Spiking Neurons for Fast Prosthetic Hand
Control [1.6114012813668934]
In this paper, instead of the self-attention mechanism, we use a sliding window attention mechanism.
We show that this mechanism is more efficient for continuous signals with finite-range dependencies between input and target.
Our results hold great promises for accurate and fast online processing of sEMG signals for smooth prosthetic hand control.
arXiv Detail & Related papers (2023-03-21T13:59:35Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - RF-Photonic Deep Learning Processor with Shannon-Limited Data Movement [0.0]
Optical neural networks (ONNs) are promising accelerators with ultra-low latency and energy consumption.
We introduce our multiplicative analog frequency transform ONN (MAFT-ONN) that encodes the data in the frequency domain.
We experimentally demonstrate the first hardware accelerator that computes fully-analog deep learning on raw RF signals.
arXiv Detail & Related papers (2022-07-08T16:37:13Z) - DNN Training Acceleration via Exploring GPGPU Friendly Sparsity [16.406482603838157]
We propose the Approximate Random Dropout that replaces the conventional random dropout of neurons and synapses with a regular and online generated row-based or tile-based dropout patterns.
We then develop a SGD-based Search Algorithm that produces the distribution of row-based or tile-based dropout patterns to compensate for the potential accuracy loss.
We also propose the sensitivity-aware dropout method to dynamically drop the input feature maps based on their sensitivity so as to achieve greater forward and backward training acceleration.
arXiv Detail & Related papers (2022-03-11T01:32:03Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Two-Timescale End-to-End Learning for Channel Acquisition and Hybrid
Precoding [94.40747235081466]
We propose an end-to-end deep learning-based joint transceiver design algorithm for millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems.
We develop a DNN architecture that maps the received pilots into feedback bits at the receiver, and then further maps the feedback bits into the hybrid precoder at the transmitter.
arXiv Detail & Related papers (2021-10-22T20:49:02Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - NullaNet Tiny: Ultra-low-latency DNN Inference Through Fixed-function
Combinational Logic [4.119948826527649]
Field-programmable gate array (FPGA)-based accelerators are gaining traction as a serious contender to replace graphics processing unit/central processing unit-based platforms.
This paper presents NullaNet Tiny, a framework for constructing resource and energy-efficient, ultra-low-latency FPGA-based neural network accelerators.
arXiv Detail & Related papers (2021-04-07T00:16:39Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.