Saving RNN Computations with a Neuron-Level Fuzzy Memoization Scheme
- URL: http://arxiv.org/abs/2202.06563v1
- Date: Mon, 14 Feb 2022 09:02:03 GMT
- Title: Saving RNN Computations with a Neuron-Level Fuzzy Memoization Scheme
- Authors: Franyell Silfa, Jose-Maria Arnau, Antonio Gonz\'alez
- Abstract summary: Recurrent Neural Networks (RNNs) are a key technology for applications such as automatic speech recognition or machine translation.
We build a neuron-level fuzzy memoization scheme, which dynamically caches each neuron's output and reuses it whenever it is predicted that the current output will be similar to a previously computed result.
We show that our technique avoids more than 26.7% of computations, resulting in 21% energy savings and 1.4x speedup on average.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recurrent Neural Networks (RNNs) are a key technology for applications such
as automatic speech recognition or machine translation. Unlike conventional
feed-forward DNNs, RNNs remember past information to improve the accuracy of
future predictions and, therefore, they are very effective for sequence
processing problems.
For each application run, recurrent layers are executed many times for
processing a potentially large sequence of inputs (words, images, audio frames,
etc.). In this paper, we observe that the output of a neuron exhibits small
changes in consecutive invocations.~We exploit this property to build a
neuron-level fuzzy memoization scheme, which dynamically caches each neuron's
output and reuses it whenever it is predicted that the current output will be
similar to a previously computed result, avoiding in this way the output
computations.
The main challenge in this scheme is determining whether the new neuron's
output for the current input in the sequence will be similar to a recently
computed result. To this end, we extend the recurrent layer with a much simpler
Bitwise Neural Network (BNN), and show that the BNN and RNN outputs are highly
correlated: if two BNN outputs are very similar, the corresponding outputs in
the original RNN layer are likely to exhibit negligible changes. The BNN
provides a low-cost and effective mechanism for deciding when fuzzy memoization
can be applied with a small impact on accuracy.
We evaluate our memoization scheme on top of a state-of-the-art accelerator
for RNNs, for a variety of different neural networks from multiple application
domains. We show that our technique avoids more than 26.7\% of computations,
resulting in 21\% energy savings and 1.4x speedup on average.
Related papers
- Accurate Mapping of RNNs on Neuromorphic Hardware with Adaptive Spiking Neurons [2.9410174624086025]
We present a $SigmaDelta$-low-pass RNN (lpRNN) for mapping rate-based RNNs to spiking neural networks (SNNs)
An adaptive spiking neuron model encodes signals using $SigmaDelta$-modulation and enables precise mapping.
We demonstrate the implementation of the lpRNN on Intel's neuromorphic research chip Loihi.
arXiv Detail & Related papers (2024-07-18T14:06:07Z) - Use of Parallel Explanatory Models to Enhance Transparency of Neural Network Configurations for Cell Degradation Detection [18.214293024118145]
We build a parallel model to illuminate and understand the internal operation of neural networks.
We show how each layer of the RNN transforms the input distributions to increase detection accuracy.
At the same time we also discover a side effect acting to limit the improvement in accuracy.
arXiv Detail & Related papers (2024-04-17T12:22:54Z) - Sparse Spiking Neural Network: Exploiting Heterogeneity in Timescales
for Pruning Recurrent SNN [19.551319330414085]
Spiking Neural Networks (RSNNs) have emerged as a computationally efficient and brain-inspired learning model.
Traditionally, sparse SNNs are obtained by first training a dense and complex SNN for a target task.
This paper presents a task-agnostic methodology for designing sparse RSNNs by pruning a large randomly model.
arXiv Detail & Related papers (2024-03-06T02:36:15Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Variable Bitrate Neural Fields [75.24672452527795]
We present a dictionary method for compressing feature grids, reducing their memory consumption by up to 100x.
We formulate the dictionary optimization as a vector-quantized auto-decoder problem which lets us learn end-to-end discrete neural representations in a space where no direct supervision is available.
arXiv Detail & Related papers (2022-06-15T17:58:34Z) - AEGNN: Asynchronous Event-based Graph Neural Networks [54.528926463775946]
Event-based Graph Neural Networks generalize standard GNNs to process events as "evolving"-temporal graphs.
AEGNNs are easily trained on synchronous inputs and can be converted to efficient, "asynchronous" networks at test time.
arXiv Detail & Related papers (2022-03-31T16:21:12Z) - Training Feedback Spiking Neural Networks by Implicit Differentiation on
the Equilibrium State [66.2457134675891]
Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware.
Most existing methods imitate the backpropagation framework and feedforward architectures for artificial neural networks.
We propose a novel training method that does not rely on the exact reverse of the forward computation.
arXiv Detail & Related papers (2021-09-29T07:46:54Z) - iRNN: Integer-only Recurrent Neural Network [0.8766022970635899]
We present a quantization-aware training method for obtaining a highly accurate integer-only recurrent neural network (iRNN)
Our iRNN maintains similar performance as its full-precision counterpart, their deployment on smartphones improves the runtime performance by $2times$, and reduces the model size by $4times$.
arXiv Detail & Related papers (2021-09-20T20:17:40Z) - Spike time displacement based error backpropagation in convolutional
spiking neural networks [0.6193838300896449]
In this paper, we extend the STiDi-BP algorithm to employ it in deeper and convolutional architectures.
The evaluation results on the image classification task based on two popular benchmarks, MNIST and Fashion-MNIST, confirm that this algorithm has been applicable in deep SNNs.
We consider a convolutional SNN with two sets of weights: real-valued weights that are updated in the backward pass and their signs, binary weights, that are employed in the feedforward process.
arXiv Detail & Related papers (2021-08-31T05:18:59Z) - ActNN: Reducing Training Memory Footprint via 2-Bit Activation
Compressed Training [68.63354877166756]
ActNN is a memory-efficient training framework that stores randomly quantized activations for back propagation.
ActNN reduces the memory footprint of the activation by 12x, and it enables training with a 6.6x to 14x larger batch size.
arXiv Detail & Related papers (2021-04-29T05:50:54Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.