Related papers: Training Integer-Only Deep Recurrent Neural Networks

Training Integer-Only Deep Recurrent Neural Networks

URL: http://arxiv.org/abs/2212.11791v1
Date: Thu, 22 Dec 2022 15:22:36 GMT
Title: Training Integer-Only Deep Recurrent Neural Networks
Authors: Vahid Partovi Nia, Eyy\"ub Sari, Vanessa Courville, Masoud Asgharian
Abstract summary: We present a quantization-aware training method for obtaining a highly accurate integer-only recurrent neural network (iRNN) Our approach supports layer normalization, attention, and an adaptive piecewise linear (PWL) approximation of activation functions. The proposed method enables RNN-based language models to run on edge devices with $2times$ improvement in runtime.
Score: 3.1829446824051195
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recurrent neural networks (RNN) are the backbone of many text and speech applications. These architectures are typically made up of several computationally complex components such as; non-linear activation functions, normalization, bi-directional dependence and attention. In order to maintain good accuracy, these components are frequently run using full-precision floating-point computation, making them slow, inefficient and difficult to deploy on edge devices. In addition, the complex nature of these operations makes them challenging to quantize using standard quantization methods without a significant performance drop. We present a quantization-aware training method for obtaining a highly accurate integer-only recurrent neural network (iRNN). Our approach supports layer normalization, attention, and an adaptive piecewise linear (PWL) approximation of activation functions, to serve a wide range of state-of-the-art RNNs. The proposed method enables RNN-based language models to run on edge devices with $2\times$ improvement in runtime, and $4\times$ reduction in model size while maintaining similar accuracy as its full-precision counterpart.

Related papers

Quantized Approximately Orthogonal Recurrent Neural Networks [6.524758376347808]
We explore the quantization of the weight matrices in ORNNs, leading to Quantized approximately Orthogonal RNNs (QORNNs) We propose and investigate two strategies to learn QORNN by combining quantization-aware training (QAT) and computation projections. The most efficient models achieve results similar to state-of-the-art full-precision ORNN, LSTM and FastRNN on a variety of standard benchmarks, even with 4-bits quantization.
arXiv Detail & Related papers (2024-02-05T09:59:57Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Recurrent Bilinear Optimization for Binary Neural Networks [58.972212365275595]
BNNs neglect the intrinsic bilinear relationship of real-valued weights and scale factors. Our work is the first attempt to optimize BNNs from the bilinear perspective. We obtain robust RBONNs, which show impressive performance over state-of-the-art BNNs on various models and datasets.
arXiv Detail & Related papers (2022-09-04T06:45:33Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Low-bit Quantization of Recurrent Neural Network Language Models Using Alternating Direction Methods of Multipliers [67.688697838109]
This paper presents a novel method to train quantized RNNLMs from scratch using alternating direction methods of multipliers (ADMM) Experiments on two tasks suggest the proposed ADMM quantization achieved a model size compression factor of up to 31 times over the full precision baseline RNNLMs.
arXiv Detail & Related papers (2021-11-29T09:30:06Z)
iRNN: Integer-only Recurrent Neural Network [0.8766022970635899]
We present a quantization-aware training method for obtaining a highly accurate integer-only recurrent neural network (iRNN) Our iRNN maintains similar performance as its full-precision counterpart, their deployment on smartphones improves the runtime performance by $2times$, and reduces the model size by $4times$.
arXiv Detail & Related papers (2021-09-20T20:17:40Z)
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks. We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)
On the quantization of recurrent neural networks [9.549757800469196]
quantization of neural networks can be defined as the approximation of the high precision computation of the canonical neural network formulation. We present an integer-only quantization strategy for Long Short-Term Memory (LSTM) neural network topologies.
arXiv Detail & Related papers (2021-01-14T04:25:08Z)
Progressive Tandem Learning for Pattern Recognition with Deep Spiking Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency. We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z)
Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters. Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques. We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
Lossless Compression of Deep Neural Networks [17.753357839478575]
Deep neural networks have been successful in many predictive modeling tasks, such as image and language recognition. It is challenging to deploy these networks under limited computational resources, such as in mobile devices. We introduce an algorithm that removes units and layers of a neural network while not changing the output that is produced.
arXiv Detail & Related papers (2020-01-01T15:04:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.