Training Integer-Only Deep Recurrent Neural Networks
- URL: http://arxiv.org/abs/2212.11791v1
- Date: Thu, 22 Dec 2022 15:22:36 GMT
- Title: Training Integer-Only Deep Recurrent Neural Networks
- Authors: Vahid Partovi Nia, Eyy\"ub Sari, Vanessa Courville, Masoud Asgharian
- Abstract summary: We present a quantization-aware training method for obtaining a highly accurate integer-only recurrent neural network (iRNN)
Our approach supports layer normalization, attention, and an adaptive piecewise linear (PWL) approximation of activation functions.
The proposed method enables RNN-based language models to run on edge devices with $2times$ improvement in runtime.
- Score: 3.1829446824051195
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recurrent neural networks (RNN) are the backbone of many text and speech
applications. These architectures are typically made up of several
computationally complex components such as; non-linear activation functions,
normalization, bi-directional dependence and attention. In order to maintain
good accuracy, these components are frequently run using full-precision
floating-point computation, making them slow, inefficient and difficult to
deploy on edge devices. In addition, the complex nature of these operations
makes them challenging to quantize using standard quantization methods without
a significant performance drop. We present a quantization-aware training method
for obtaining a highly accurate integer-only recurrent neural network (iRNN).
Our approach supports layer normalization, attention, and an adaptive piecewise
linear (PWL) approximation of activation functions, to serve a wide range of
state-of-the-art RNNs. The proposed method enables RNN-based language models to
run on edge devices with $2\times$ improvement in runtime, and $4\times$
reduction in model size while maintaining similar accuracy as its
full-precision counterpart.
Related papers
- Quantized Approximately Orthogonal Recurrent Neural Networks [6.524758376347808]
We explore the quantization of the weight matrices in ORNNs, leading to Quantized approximately Orthogonal RNNs (QORNNs)
We propose and investigate two strategies to learn QORNN by combining quantization-aware training (QAT) and computation projections.
The most efficient models achieve results similar to state-of-the-art full-precision ORNN, LSTM and FastRNN on a variety of standard benchmarks, even with 4-bits quantization.
arXiv Detail & Related papers (2024-02-05T09:59:57Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Recurrent Bilinear Optimization for Binary Neural Networks [58.972212365275595]
BNNs neglect the intrinsic bilinear relationship of real-valued weights and scale factors.
Our work is the first attempt to optimize BNNs from the bilinear perspective.
We obtain robust RBONNs, which show impressive performance over state-of-the-art BNNs on various models and datasets.
arXiv Detail & Related papers (2022-09-04T06:45:33Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Low-bit Quantization of Recurrent Neural Network Language Models Using
Alternating Direction Methods of Multipliers [67.688697838109]
This paper presents a novel method to train quantized RNNLMs from scratch using alternating direction methods of multipliers (ADMM)
Experiments on two tasks suggest the proposed ADMM quantization achieved a model size compression factor of up to 31 times over the full precision baseline RNNLMs.
arXiv Detail & Related papers (2021-11-29T09:30:06Z) - iRNN: Integer-only Recurrent Neural Network [0.8766022970635899]
We present a quantization-aware training method for obtaining a highly accurate integer-only recurrent neural network (iRNN)
Our iRNN maintains similar performance as its full-precision counterpart, their deployment on smartphones improves the runtime performance by $2times$, and reduces the model size by $4times$.
arXiv Detail & Related papers (2021-09-20T20:17:40Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - On the quantization of recurrent neural networks [9.549757800469196]
quantization of neural networks can be defined as the approximation of the high precision computation of the canonical neural network formulation.
We present an integer-only quantization strategy for Long Short-Term Memory (LSTM) neural network topologies.
arXiv Detail & Related papers (2021-01-14T04:25:08Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z) - Lossless Compression of Deep Neural Networks [17.753357839478575]
Deep neural networks have been successful in many predictive modeling tasks, such as image and language recognition.
It is challenging to deploy these networks under limited computational resources, such as in mobile devices.
We introduce an algorithm that removes units and layers of a neural network while not changing the output that is produced.
arXiv Detail & Related papers (2020-01-01T15:04:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.