iRNN: Integer-only Recurrent Neural Network
- URL: http://arxiv.org/abs/2109.09828v1
- Date: Mon, 20 Sep 2021 20:17:40 GMT
- Title: iRNN: Integer-only Recurrent Neural Network
- Authors: Eyy\"ub Sari, Vanessa Courville, Vahid Partovi Nia
- Abstract summary: We present a quantization-aware training method for obtaining a highly accurate integer-only recurrent neural network (iRNN)
Our iRNN maintains similar performance as its full-precision counterpart, their deployment on smartphones improves the runtime performance by $2times$, and reduces the model size by $4times$.
- Score: 0.8766022970635899
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recurrent neural networks (RNN) are used in many real-world text and speech
applications. They include complex modules such as recurrence,
exponential-based activation, gate interaction, unfoldable normalization,
bi-directional dependence, and attention. The interaction between these
elements prevents running them on integer-only operations without a significant
performance drop. Deploying RNNs that include layer normalization and attention
on integer-only arithmetic is still an open problem. We present a
quantization-aware training method for obtaining a highly accurate integer-only
recurrent neural network (iRNN). Our approach supports layer normalization,
attention, and an adaptive piecewise linear approximation of activations, to
serve a wide range of RNNs on various applications. The proposed method is
proven to work on RNN-based language models and automatic speech recognition.
Our iRNN maintains similar performance as its full-precision counterpart, their
deployment on smartphones improves the runtime performance by $2\times$, and
reduces the model size by $4\times$.
Related papers
- Accurate Mapping of RNNs on Neuromorphic Hardware with Adaptive Spiking Neurons [2.9410174624086025]
We present a $SigmaDelta$-low-pass RNN (lpRNN) for mapping rate-based RNNs to spiking neural networks (SNNs)
An adaptive spiking neuron model encodes signals using $SigmaDelta$-modulation and enables precise mapping.
We demonstrate the implementation of the lpRNN on Intel's neuromorphic research chip Loihi.
arXiv Detail & Related papers (2024-07-18T14:06:07Z) - Training Integer-Only Deep Recurrent Neural Networks [3.1829446824051195]
We present a quantization-aware training method for obtaining a highly accurate integer-only recurrent neural network (iRNN)
Our approach supports layer normalization, attention, and an adaptive piecewise linear (PWL) approximation of activation functions.
The proposed method enables RNN-based language models to run on edge devices with $2times$ improvement in runtime.
arXiv Detail & Related papers (2022-12-22T15:22:36Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Recurrent Bilinear Optimization for Binary Neural Networks [58.972212365275595]
BNNs neglect the intrinsic bilinear relationship of real-valued weights and scale factors.
Our work is the first attempt to optimize BNNs from the bilinear perspective.
We obtain robust RBONNs, which show impressive performance over state-of-the-art BNNs on various models and datasets.
arXiv Detail & Related papers (2022-09-04T06:45:33Z) - Saving RNN Computations with a Neuron-Level Fuzzy Memoization Scheme [0.0]
Recurrent Neural Networks (RNNs) are a key technology for applications such as automatic speech recognition or machine translation.
We build a neuron-level fuzzy memoization scheme, which dynamically caches each neuron's output and reuses it whenever it is predicted that the current output will be similar to a previously computed result.
We show that our technique avoids more than 26.7% of computations, resulting in 21% energy savings and 1.4x speedup on average.
arXiv Detail & Related papers (2022-02-14T09:02:03Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - ActNN: Reducing Training Memory Footprint via 2-Bit Activation
Compressed Training [68.63354877166756]
ActNN is a memory-efficient training framework that stores randomly quantized activations for back propagation.
ActNN reduces the memory footprint of the activation by 12x, and it enables training with a 6.6x to 14x larger batch size.
arXiv Detail & Related papers (2021-04-29T05:50:54Z) - DiffRNN: Differential Verification of Recurrent Neural Networks [3.4423518864863154]
Recurrent neural networks (RNNs) have become popular in a variety of applications such as image processing, data classification, speech recognition, and as controllers in autonomous systems.
We propose DIFFRNN, the first differential verification method for RNNs to certify the equivalence of two structurally similar neural networks.
We demonstrate the practical efficacy of our technique on a variety of benchmarks and show that DIFFRNN outperforms state-of-the-art verification tools such as POPQORN.
arXiv Detail & Related papers (2020-07-20T14:14:35Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z) - Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization.
Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z) - Exploring Pre-training with Alignments for RNN Transducer based
End-to-End Speech Recognition [39.497407288772386]
recurrent neural network transducer (RNN-T) architecture has become an emerging trend in end-to-end automatic speech recognition research.
In this work, we leverage external alignments to seed the RNN-T model.
Two different pre-training solutions are explored, referred to as encoder pre-training, and whole-network pre-training respectively.
arXiv Detail & Related papers (2020-05-01T19:00:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.