Related papers: SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural Network

SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural Network

URL: http://arxiv.org/abs/1911.01258v3
Date: Sun, 21 May 2023 04:12:46 GMT
Title: SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural Network
Authors: Reza Yazdani, Olatunji Ruwase, Minjia Zhang, Yuxiong He, Jose-Maria Arnau, Antonio Gonzalez
Abstract summary: We propose an intelligent tiled-based mechanism for increasing the adaptiveness of RNN, in order to efficiently handle the data dependencies. Sharp achieves 2x, 2.8x, and 82x speedups on average, considering different RNN models and resource budgets.
Score: 17.928105470385614
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The effectiveness of Recurrent Neural Networks (RNNs) for tasks such as Automatic Speech Recognition has fostered interest in RNN inference acceleration. Due to the recurrent nature and data dependencies of RNN computations, prior work has designed customized architectures specifically tailored to the computation pattern of RNN, getting high computation efficiency for certain chosen model sizes. However, given that the dimensionality of RNNs varies a lot for different tasks, it is crucial to generalize this efficiency to diverse configurations. In this work, we identify adaptiveness as a key feature that is missing from today's RNN accelerators. In particular, we first show the problem of low resource-utilization and low adaptiveness for the state-of-the-art RNN implementations on GPU, FPGA and ASIC architectures. To solve these issues, we propose an intelligent tiled-based dispatching mechanism for increasing the adaptiveness of RNN computation, in order to efficiently handle the data dependencies. To do so, we propose Sharp as a hardware accelerator, which pipelines RNN computation using an effective scheduling scheme to hide most of the dependent serialization. Furthermore, Sharp employs dynamic reconfigurable architecture to adapt to the model's characteristics. Sharp achieves 2x, 2.8x, and 82x speedups on average, considering different RNN models and resource budgets, compared to the state-of-the-art ASIC, FPGA, and GPU implementations, respectively. Furthermore, we provide significant energy-reduction with respect to the previous solutions, due to the low power dissipation of Sharp (321 GFLOPS/Watt).

Related papers

QP-SNN: Quantized and Pruned Spiking Neural Networks [10.74122828236122]
Spiking Neural Networks (SNNs) leverage spikes to encode information and operate in an event-driven manner. We propose a hardware-friendly and lightweight SNN, aimed at effectively deploying high-performance SNN in resource-limited scenarios.
arXiv Detail & Related papers (2025-02-09T13:50:59Z)
GhostRNN: Reducing State Redundancy in RNN with Cheap Operations [66.14054138609355]
We propose an efficient RNN architecture, GhostRNN, which reduces hidden state redundancy with cheap operations. Experiments on KWS and SE tasks demonstrate that the proposed GhostRNN significantly reduces the memory usage (40%) and computation cost while keeping performance similar.
arXiv Detail & Related papers (2024-11-20T11:37:14Z)
Scalable Mechanistic Neural Networks [52.28945097811129]
We propose an enhanced neural network framework designed for scientific machine learning applications involving long temporal sequences. By reformulating the original Mechanistic Neural Network (MNN) we reduce the computational time and space complexities from cubic and quadratic with respect to the sequence length, respectively, to linear. Extensive experiments demonstrate that S-MNN matches the original MNN in precision while substantially reducing computational resources.
arXiv Detail & Related papers (2024-10-08T14:27:28Z)
Accurate Mapping of RNNs on Neuromorphic Hardware with Adaptive Spiking Neurons [2.9410174624086025]
We present a $SigmaDelta$-low-pass RNN (lpRNN) for mapping rate-based RNNs to spiking neural networks (SNNs) An adaptive spiking neuron model encodes signals using $SigmaDelta$-modulation and enables precise mapping. We demonstrate the implementation of the lpRNN on Intel's neuromorphic research chip Loihi.
arXiv Detail & Related papers (2024-07-18T14:06:07Z)
DNN Partitioning, Task Offloading, and Resource Allocation in Dynamic Vehicular Networks: A Lyapunov-Guided Diffusion-Based Reinforcement Learning Approach [49.56404236394601]
We formulate the problem of joint DNN partitioning, task offloading, and resource allocation in Vehicular Edge Computing. Our objective is to minimize the DNN-based task completion time while guaranteeing the system stability over time. We propose a Multi-Agent Diffusion-based Deep Reinforcement Learning (MAD2RL) algorithm, incorporating the innovative use of diffusion models.
arXiv Detail & Related papers (2024-06-11T06:31:03Z)
CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition [8.302549684364195]
We propose a novel model named CIF-Transducer (CIF-T) which incorporates the Continuous Integrate-and-Fire (CIF) mechanism with the RNN-T model to achieve efficient alignment. CIF-T achieves state-of-the-art results with lower computational overhead compared to RNN-T models.
arXiv Detail & Related papers (2023-07-26T11:59:14Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Recurrent Bilinear Optimization for Binary Neural Networks [58.972212365275595]
BNNs neglect the intrinsic bilinear relationship of real-valued weights and scale factors. Our work is the first attempt to optimize BNNs from the bilinear perspective. We obtain robust RBONNs, which show impressive performance over state-of-the-art BNNs on various models and datasets.
arXiv Detail & Related papers (2022-09-04T06:45:33Z)
Training High-Performance Low-Latency Spiking Neural Networks by Differentiation on Spike Representation [70.75043144299168]
Spiking Neural Network (SNN) is a promising energy-efficient AI model when implemented on neuromorphic hardware. It is a challenge to efficiently train SNNs due to their non-differentiability. We propose the Differentiation on Spike Representation (DSR) method, which could achieve high performance.
arXiv Detail & Related papers (2022-05-01T12:44:49Z)
Weightless Neural Networks for Efficient Edge Inference [1.7882696915798877]
Weightless Neural Networks (WNNs) are a class of machine learning model which use table lookups to perform inference. We propose a novel WNN architecture, BTHOWeN, with key algorithmic and architectural improvements over prior work. BTHOWeN targets the large and growing edge computing sector by providing superior latency and energy efficiency.
arXiv Detail & Related papers (2022-03-03T01:46:05Z)
Can Deep Neural Networks be Converted to Ultra Low-Latency Spiking Neural Networks? [3.2108350580418166]
Spiking neural networks (SNNs) operate via binary spikes distributed over time. SOTA training strategies for SNNs involve conversion from a non-spiking deep neural network (DNN) We propose a new training algorithm that accurately captures these distributions, minimizing the error between the DNN and converted SNN.
arXiv Detail & Related papers (2021-12-22T18:47:45Z)
Dynamically Throttleable Neural Networks (TNN) [24.052859278938858]
Conditional computation for Deep Neural Networks (DNNs) reduce overall computational load and improve model accuracy by running a subset of the network. We present a runtime throttleable neural network (TNN) that can adaptively self-regulate its own performance target and computing resources.
arXiv Detail & Related papers (2020-11-01T20:17:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.