Related papers: RNNAccel: A Fusion Recurrent Neural Network Accelerator for Edge Intelligence

RNNAccel: A Fusion Recurrent Neural Network Accelerator for Edge Intelligence

URL: http://arxiv.org/abs/2010.13311v1
Date: Mon, 26 Oct 2020 03:36:36 GMT
Title: RNNAccel: A Fusion Recurrent Neural Network Accelerator for Edge Intelligence
Authors: Chao-Yang Kao, Huang-Chih Kuo, Jian-Wen Chen, Chiung-Liang Lin, Pin-Han Chen and Youn-Long Lin
Abstract summary: We present an RNN deep learning accelerator, called RNNAccel. It supports Long Short-Term Memory (LSTM) network, Gated Recurrent Unit (GRU) network, and Fully Connected Layer (FC)/ Multiple-Perceptron Layer (MLP) networks. The 32-MAC RNN accelerator achieves 90% MAC utilization, 1.27 TOPs/W at 40nm process, 8x compression ratio, and 90% inference accuracy.
Score: 2.055204980188575
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many edge devices employ Recurrent Neural Networks (RNN) to enhance their product intelligence. However, the increasing computation complexity poses challenges for performance, energy efficiency and product development time. In this paper, we present an RNN deep learning accelerator, called RNNAccel, which supports Long Short-Term Memory (LSTM) network, Gated Recurrent Unit (GRU) network, and Fully Connected Layer (FC)/ Multiple-Perceptron Layer (MLP) networks. This RNN accelerator addresses (1) computing unit utilization bottleneck caused by RNN data dependency, (2) inflexible design for specific applications, (3) energy consumption dominated by memory access, (4) accuracy loss due to coefficient compression, and (5) unpredictable performance resulting from processor-accelerator integration. Our proposed RNN accelerator consists of a configurable 32-MAC array and a coefficient decompression engine. The MAC array can be scaled-up to meet throughput requirement and power budget. Its sophisticated off-line compression and simple hardware-friendly on-line decompression, called NeuCompression, reduces memory footprint up to 16x and decreases memory access power. Furthermore, for easy SOC integration, we developed a tool set for bit-accurate simulation and integration result validation. Evaluated using a keyword spotting application, the 32-MAC RNN accelerator achieves 90% MAC utilization, 1.27 TOPs/W at 40nm process, 8x compression ratio, and 90% inference accuracy.

Related papers

Weight Block Sparsity: Training, Compilation, and AI Engine Accelerators [0.0]
Deep Neural Networks (DNNs) are being developed, trained, and utilized, putting a strain on both advanced and limited devices. Our solution is to implement em weight block sparsity, which is a structured sparsity that is friendly to hardware. We will present performance estimates using accurate and complete code generation for AIE2 configuration sets (AMD Versal FPGAs) with Resnet50, Inception V3, and VGG16.
arXiv Detail & Related papers (2024-07-12T17:37:49Z)
LitE-SNN: Designing Lightweight and Efficient Spiking Neural Network through Spatial-Temporal Compressive Network Search and Joint Optimization [48.41286573672824]
Spiking Neural Networks (SNNs) mimic the information-processing mechanisms of the human brain and are highly energy-efficient. We propose a new approach named LitE-SNN that incorporates both spatial and temporal compression into the automated network design process.
arXiv Detail & Related papers (2024-01-26T05:23:11Z)
Spiker+: a framework for the generation of efficient Spiking Neural Networks FPGA accelerators for inference at the edge [49.42371633618761]
Spiker+ is a framework for generating efficient, low-power, and low-area customized Spiking Neural Networks (SNN) accelerators on FPGA for inference at the edge. Spiker+ is tested on two benchmark datasets, the MNIST and the Spiking Heidelberg Digits (SHD)
arXiv Detail & Related papers (2024-01-02T10:42:42Z)
RAMAN: A Re-configurable and Sparse tinyML Accelerator for Inference on Edge [1.8293684411977293]
Deep Neural Network (DNN) based inference at the edge is challenging as these compute and data-intensive algorithms need to be implemented at low cost and low power. We present RAMAN, a Re-configurable and spArse tinyML Accelerator for infereNce on edge, architected to exploit the sparsity to reduce area (storage), power as well as latency.
arXiv Detail & Related papers (2023-06-10T17:25:58Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks. We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)
Lightweight Compression of Intermediate Neural Network Features for Collaborative Intelligence [32.03465747357384]
In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a lightweight device such as a mobile phone or edge device. This paper presents a novel lightweight compression technique designed specifically to quantize and compress the features output by the intermediate layer of a split DNN.
arXiv Detail & Related papers (2021-05-15T00:10:12Z)
Lightweight compression of neural network feature tensors for collaborative intelligence [32.03465747357384]
In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a relatively low-complexity device such as a mobile phone or edge device. This paper presents a novel lightweight compression technique designed specifically to code the activations of a split DNN layer.
arXiv Detail & Related papers (2021-05-12T23:41:35Z)
AdderNet and its Minimalist Hardware Design for Energy-Efficient Artificial Intelligence [111.09105910265154]
We present a novel minimalist hardware architecture using adder convolutional neural network (AdderNet) The whole AdderNet can practically achieve 16% enhancement in speed. We conclude the AdderNet is able to surpass all the other competitors.
arXiv Detail & Related papers (2021-01-25T11:31:52Z)
Balancing Efficiency and Flexibility for DNN Acceleration via Temporal GPU-Systolic Array Integration [22.90145417561172]
We propose Simultaneous Multi-mode Architecture (SMA), a novel architecture design and execution model. SMA offers general-purpose programmability on DNN accelerators in order to accelerate end-to-end applications. The SMA achieves up to 63% performance improvement while consuming 23% less energy than the baseline architecture with Volta-Core.
arXiv Detail & Related papers (2020-02-18T17:44:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.