Related papers: GhostRNN: Reducing State Redundancy in RNN with Cheap Operations

GhostRNN: Reducing State Redundancy in RNN with Cheap Operations

URL: http://arxiv.org/abs/2411.14489v1
Date: Wed, 20 Nov 2024 11:37:14 GMT
Title: GhostRNN: Reducing State Redundancy in RNN with Cheap Operations
Authors: Hang Zhou, Xiaoxu Zheng, Yunhe Wang, Michael Bi Mi, Deyi Xiong, Kai Han,
Abstract summary: We propose an efficient RNN architecture, GhostRNN, which reduces hidden state redundancy with cheap operations. Experiments on KWS and SE tasks demonstrate that the proposed GhostRNN significantly reduces the memory usage (40%) and computation cost while keeping performance similar.
Score: 66.14054138609355
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recurrent neural network (RNNs) that are capable of modeling long-distance dependencies are widely used in various speech tasks, eg., keyword spotting (KWS) and speech enhancement (SE). Due to the limitation of power and memory in low-resource devices, efficient RNN models are urgently required for real-world applications. In this paper, we propose an efficient RNN architecture, GhostRNN, which reduces hidden state redundancy with cheap operations. In particular, we observe that partial dimensions of hidden states are similar to the others in trained RNN models, suggesting that redundancy exists in specific RNNs. To reduce the redundancy and hence computational cost, we propose to first generate a few intrinsic states, and then apply cheap operations to produce ghost states based on the intrinsic states. Experiments on KWS and SE tasks demonstrate that the proposed GhostRNN significantly reduces the memory usage (~40%) and computation cost while keeping performance similar.

Related papers

S$^2$NN: Sub-bit Spiking Neural Networks [53.08060832135342]
Spiking Neural Networks (SNNs) offer an energy-efficient paradigm for machine intelligence.<n>Despite recent advances in binary SNNs, the storage and computational demands remain substantial for large-scale networks.<n>We propose Sub-bit Spiking Neural Networks (S$2$NNs) that represent weights with less than one bit.
arXiv Detail & Related papers (2025-09-29T04:17:44Z)
On the Computational Complexity and Formal Hierarchy of Second Order Recurrent Neural Networks [59.85314067235965]
We extend the theoretical foundation for the $2nd$-order recurrent network ($2nd$ RNN) We prove there exists a class of a $2nd$ RNN that is Turing-complete with bounded time. We also demonstrate that $2$nd order RNNs, without memory, outperform modern-day models such as vanilla RNNs and gated recurrent units in recognizing regular grammars.
arXiv Detail & Related papers (2023-09-26T06:06:47Z)
Training High-Performance Low-Latency Spiking Neural Networks by Differentiation on Spike Representation [70.75043144299168]
Spiking Neural Network (SNN) is a promising energy-efficient AI model when implemented on neuromorphic hardware. It is a challenge to efficiently train SNNs due to their non-differentiability. We propose the Differentiation on Spike Representation (DSR) method, which could achieve high performance.
arXiv Detail & Related papers (2022-05-01T12:44:49Z)
Weightless Neural Networks for Efficient Edge Inference [1.7882696915798877]
Weightless Neural Networks (WNNs) are a class of machine learning model which use table lookups to perform inference. We propose a novel WNN architecture, BTHOWeN, with key algorithmic and architectural improvements over prior work. BTHOWeN targets the large and growing edge computing sector by providing superior latency and energy efficiency.
arXiv Detail & Related papers (2022-03-03T01:46:05Z)
Can Deep Neural Networks be Converted to Ultra Low-Latency Spiking Neural Networks? [3.2108350580418166]
Spiking neural networks (SNNs) operate via binary spikes distributed over time. SOTA training strategies for SNNs involve conversion from a non-spiking deep neural network (DNN) We propose a new training algorithm that accurately captures these distributions, minimizing the error between the DNN and converted SNN.
arXiv Detail & Related papers (2021-12-22T18:47:45Z)
iRNN: Integer-only Recurrent Neural Network [0.8766022970635899]
We present a quantization-aware training method for obtaining a highly accurate integer-only recurrent neural network (iRNN) Our iRNN maintains similar performance as its full-precision counterpart, their deployment on smartphones improves the runtime performance by $2times$, and reduces the model size by $4times$.
arXiv Detail & Related papers (2021-09-20T20:17:40Z)
Deep Time Delay Neural Network for Speech Enhancement with Full Data Learning [60.20150317299749]
This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learning. To make full use of the training data, we propose a full data learning method for speech enhancement.
arXiv Detail & Related papers (2020-11-11T06:32:37Z)
Dynamically Throttleable Neural Networks (TNN) [24.052859278938858]
Conditional computation for Deep Neural Networks (DNNs) reduce overall computational load and improve model accuracy by running a subset of the network. We present a runtime throttleable neural network (TNN) that can adaptively self-regulate its own performance target and computing resources.
arXiv Detail & Related papers (2020-11-01T20:17:42Z)
Progressive Tandem Learning for Pattern Recognition with Deep Spiking Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency. We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z)
Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters. Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques. We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural Network [17.928105470385614]
We propose an intelligent tiled-based mechanism for increasing the adaptiveness of RNN, in order to efficiently handle the data dependencies. Sharp achieves 2x, 2.8x, and 82x speedups on average, considering different RNN models and resource budgets.
arXiv Detail & Related papers (2019-11-04T14:51:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.