Related papers: iRNN: Integer-only Recurrent Neural Network

iRNN: Integer-only Recurrent Neural Network

URL: http://arxiv.org/abs/2109.09828v1
Date: Mon, 20 Sep 2021 20:17:40 GMT
Title: iRNN: Integer-only Recurrent Neural Network
Authors: Eyy\"ub Sari, Vanessa Courville, Vahid Partovi Nia
Abstract summary: We present a quantization-aware training method for obtaining a highly accurate integer-only recurrent neural network (iRNN) Our iRNN maintains similar performance as its full-precision counterpart, their deployment on smartphones improves the runtime performance by $2times$, and reduces the model size by $4times$.
Score: 0.8766022970635899
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recurrent neural networks (RNN) are used in many real-world text and speech applications. They include complex modules such as recurrence, exponential-based activation, gate interaction, unfoldable normalization, bi-directional dependence, and attention. The interaction between these elements prevents running them on integer-only operations without a significant performance drop. Deploying RNNs that include layer normalization and attention on integer-only arithmetic is still an open problem. We present a quantization-aware training method for obtaining a highly accurate integer-only recurrent neural network (iRNN). Our approach supports layer normalization, attention, and an adaptive piecewise linear approximation of activations, to serve a wide range of RNNs on various applications. The proposed method is proven to work on RNN-based language models and automatic speech recognition. Our iRNN maintains similar performance as its full-precision counterpart, their deployment on smartphones improves the runtime performance by $2\times$, and reduces the model size by $4\times$.

Related papers

GhostRNN: Reducing State Redundancy in RNN with Cheap Operations [66.14054138609355]
We propose an efficient RNN architecture, GhostRNN, which reduces hidden state redundancy with cheap operations. Experiments on KWS and SE tasks demonstrate that the proposed GhostRNN significantly reduces the memory usage (40%) and computation cost while keeping performance similar.
arXiv Detail & Related papers (2024-11-20T11:37:14Z)
Accurate Mapping of RNNs on Neuromorphic Hardware with Adaptive Spiking Neurons [2.9410174624086025]
We present a $SigmaDelta$-low-pass RNN (lpRNN) for mapping rate-based RNNs to spiking neural networks (SNNs) An adaptive spiking neuron model encodes signals using $SigmaDelta$-modulation and enables precise mapping. We demonstrate the implementation of the lpRNN on Intel's neuromorphic research chip Loihi.
arXiv Detail & Related papers (2024-07-18T14:06:07Z)
Training Integer-Only Deep Recurrent Neural Networks [3.1829446824051195]
We present a quantization-aware training method for obtaining a highly accurate integer-only recurrent neural network (iRNN) Our approach supports layer normalization, attention, and an adaptive piecewise linear (PWL) approximation of activation functions. The proposed method enables RNN-based language models to run on edge devices with $2times$ improvement in runtime.
arXiv Detail & Related papers (2022-12-22T15:22:36Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Recurrent Bilinear Optimization for Binary Neural Networks [58.972212365275595]
BNNs neglect the intrinsic bilinear relationship of real-valued weights and scale factors. Our work is the first attempt to optimize BNNs from the bilinear perspective. We obtain robust RBONNs, which show impressive performance over state-of-the-art BNNs on various models and datasets.
arXiv Detail & Related papers (2022-09-04T06:45:33Z)
Saving RNN Computations with a Neuron-Level Fuzzy Memoization Scheme [0.0]
Recurrent Neural Networks (RNNs) are a key technology for applications such as automatic speech recognition or machine translation. We build a neuron-level fuzzy memoization scheme, which dynamically caches each neuron's output and reuses it whenever it is predicted that the current output will be similar to a previously computed result. We show that our technique avoids more than 26.7% of computations, resulting in 21% energy savings and 1.4x speedup on average.
arXiv Detail & Related papers (2022-02-14T09:02:03Z)
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks. We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)
DiffRNN: Differential Verification of Recurrent Neural Networks [3.4423518864863154]
Recurrent neural networks (RNNs) have become popular in a variety of applications such as image processing, data classification, speech recognition, and as controllers in autonomous systems. We propose DIFFRNN, the first differential verification method for RNNs to certify the equivalence of two structurally similar neural networks. We demonstrate the practical efficacy of our technique on a variety of benchmarks and show that DIFFRNN outperforms state-of-the-art verification tools such as POPQORN.
arXiv Detail & Related papers (2020-07-20T14:14:35Z)
Progressive Tandem Learning for Pattern Recognition with Deep Spiking Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency. We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z)
Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization. Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z)
Exploring Pre-training with Alignments for RNN Transducer based End-to-End Speech Recognition [39.497407288772386]
recurrent neural network transducer (RNN-T) architecture has become an emerging trend in end-to-end automatic speech recognition research. In this work, we leverage external alignments to seed the RNN-T model. Two different pre-training solutions are explored, referred to as encoder pre-training, and whole-network pre-training respectively.
arXiv Detail & Related papers (2020-05-01T19:00:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.