SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural
Network
- URL: http://arxiv.org/abs/1911.01258v3
- Date: Sun, 21 May 2023 04:12:46 GMT
- Title: SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural
Network
- Authors: Reza Yazdani, Olatunji Ruwase, Minjia Zhang, Yuxiong He, Jose-Maria
Arnau, Antonio Gonzalez
- Abstract summary: We propose an intelligent tiled-based mechanism for increasing the adaptiveness of RNN, in order to efficiently handle the data dependencies.
Sharp achieves 2x, 2.8x, and 82x speedups on average, considering different RNN models and resource budgets.
- Score: 17.928105470385614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The effectiveness of Recurrent Neural Networks (RNNs) for tasks such as
Automatic Speech Recognition has fostered interest in RNN inference
acceleration. Due to the recurrent nature and data dependencies of RNN
computations, prior work has designed customized architectures specifically
tailored to the computation pattern of RNN, getting high computation efficiency
for certain chosen model sizes. However, given that the dimensionality of RNNs
varies a lot for different tasks, it is crucial to generalize this efficiency
to diverse configurations. In this work, we identify adaptiveness as a key
feature that is missing from today's RNN accelerators. In particular, we first
show the problem of low resource-utilization and low adaptiveness for the
state-of-the-art RNN implementations on GPU, FPGA and ASIC architectures. To
solve these issues, we propose an intelligent tiled-based dispatching mechanism
for increasing the adaptiveness of RNN computation, in order to efficiently
handle the data dependencies. To do so, we propose Sharp as a hardware
accelerator, which pipelines RNN computation using an effective scheduling
scheme to hide most of the dependent serialization. Furthermore, Sharp employs
dynamic reconfigurable architecture to adapt to the model's characteristics.
Sharp achieves 2x, 2.8x, and 82x speedups on average, considering different RNN
models and resource budgets, compared to the state-of-the-art ASIC, FPGA, and
GPU implementations, respectively. Furthermore, we provide significant
energy-reduction with respect to the previous solutions, due to the low power
dissipation of Sharp (321 GFLOPS/Watt).
Related papers
- Scalable Mechanistic Neural Networks [52.28945097811129]
We propose an enhanced neural network framework designed for scientific machine learning applications involving long temporal sequences.
By reformulating the original Mechanistic Neural Network (MNN) we reduce the computational time and space complexities from cubic and quadratic with respect to the sequence length, respectively, to linear.
Extensive experiments demonstrate that S-MNN matches the original MNN in precision while substantially reducing computational resources.
arXiv Detail & Related papers (2024-10-08T14:27:28Z) - Accurate Mapping of RNNs on Neuromorphic Hardware with Adaptive Spiking Neurons [2.9410174624086025]
We present a $SigmaDelta$-low-pass RNN (lpRNN) for mapping rate-based RNNs to spiking neural networks (SNNs)
An adaptive spiking neuron model encodes signals using $SigmaDelta$-modulation and enables precise mapping.
We demonstrate the implementation of the lpRNN on Intel's neuromorphic research chip Loihi.
arXiv Detail & Related papers (2024-07-18T14:06:07Z) - DNN Partitioning, Task Offloading, and Resource Allocation in Dynamic Vehicular Networks: A Lyapunov-Guided Diffusion-Based Reinforcement Learning Approach [49.56404236394601]
We formulate the problem of joint DNN partitioning, task offloading, and resource allocation in Vehicular Edge Computing.
Our objective is to minimize the DNN-based task completion time while guaranteeing the system stability over time.
We propose a Multi-Agent Diffusion-based Deep Reinforcement Learning (MAD2RL) algorithm, incorporating the innovative use of diffusion models.
arXiv Detail & Related papers (2024-06-11T06:31:03Z) - CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech
Recognition [8.302549684364195]
We propose a novel model named CIF-Transducer (CIF-T) which incorporates the Continuous Integrate-and-Fire (CIF) mechanism with the RNN-T model to achieve efficient alignment.
CIF-T achieves state-of-the-art results with lower computational overhead compared to RNN-T models.
arXiv Detail & Related papers (2023-07-26T11:59:14Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Recurrent Bilinear Optimization for Binary Neural Networks [58.972212365275595]
BNNs neglect the intrinsic bilinear relationship of real-valued weights and scale factors.
Our work is the first attempt to optimize BNNs from the bilinear perspective.
We obtain robust RBONNs, which show impressive performance over state-of-the-art BNNs on various models and datasets.
arXiv Detail & Related papers (2022-09-04T06:45:33Z) - Training High-Performance Low-Latency Spiking Neural Networks by
Differentiation on Spike Representation [70.75043144299168]
Spiking Neural Network (SNN) is a promising energy-efficient AI model when implemented on neuromorphic hardware.
It is a challenge to efficiently train SNNs due to their non-differentiability.
We propose the Differentiation on Spike Representation (DSR) method, which could achieve high performance.
arXiv Detail & Related papers (2022-05-01T12:44:49Z) - Weightless Neural Networks for Efficient Edge Inference [1.7882696915798877]
Weightless Neural Networks (WNNs) are a class of machine learning model which use table lookups to perform inference.
We propose a novel WNN architecture, BTHOWeN, with key algorithmic and architectural improvements over prior work.
BTHOWeN targets the large and growing edge computing sector by providing superior latency and energy efficiency.
arXiv Detail & Related papers (2022-03-03T01:46:05Z) - Can Deep Neural Networks be Converted to Ultra Low-Latency Spiking
Neural Networks? [3.2108350580418166]
Spiking neural networks (SNNs) operate via binary spikes distributed over time.
SOTA training strategies for SNNs involve conversion from a non-spiking deep neural network (DNN)
We propose a new training algorithm that accurately captures these distributions, minimizing the error between the DNN and converted SNN.
arXiv Detail & Related papers (2021-12-22T18:47:45Z) - Dynamically Throttleable Neural Networks (TNN) [24.052859278938858]
Conditional computation for Deep Neural Networks (DNNs) reduce overall computational load and improve model accuracy by running a subset of the network.
We present a runtime throttleable neural network (TNN) that can adaptively self-regulate its own performance target and computing resources.
arXiv Detail & Related papers (2020-11-01T20:17:42Z) - A Fully Tensorized Recurrent Neural Network [48.50376453324581]
We introduce a "fully tensorized" RNN architecture which jointly encodes the separate weight matrices within each recurrent cell.
This approach reduces model size by several orders of magnitude, while still maintaining similar or better performance compared to standard RNNs.
arXiv Detail & Related papers (2020-10-08T18:24:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.