On the quantization of recurrent neural networks
- URL: http://arxiv.org/abs/2101.05453v1
- Date: Thu, 14 Jan 2021 04:25:08 GMT
- Title: On the quantization of recurrent neural networks
- Authors: Jian Li, Raziel Alvarez
- Abstract summary: quantization of neural networks can be defined as the approximation of the high precision computation of the canonical neural network formulation.
We present an integer-only quantization strategy for Long Short-Term Memory (LSTM) neural network topologies.
- Score: 9.549757800469196
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Integer quantization of neural networks can be defined as the approximation
of the high precision computation of the canonical neural network formulation,
using reduced integer precision. It plays a significant role in the efficient
deployment and execution of machine learning (ML) systems, reducing memory
consumption and leveraging typically faster computations. In this work, we
present an integer-only quantization strategy for Long Short-Term Memory (LSTM)
neural network topologies, which themselves are the foundation of many
production ML systems. Our quantization strategy is accurate (e.g. works well
with quantization post-training), efficient and fast to execute (utilizing 8
bit integer weights and mostly 8 bit activations), and is able to target a
variety of hardware (by leveraging instructions sets available in common CPU
architectures, as well as available neural accelerators).
Related papers
- Low Precision Quantization-aware Training in Spiking Neural Networks
with Differentiable Quantization Function [0.5046831208137847]
This work aims to bridge the gap between recent progress in quantized neural networks and spiking neural networks.
It presents an extensive study on the performance of the quantization function, represented as a linear combination of sigmoid functions.
The presented quantization function demonstrates the state-of-the-art performance on four popular benchmarks.
arXiv Detail & Related papers (2023-05-30T09:42:05Z) - Training Integer-Only Deep Recurrent Neural Networks [3.1829446824051195]
We present a quantization-aware training method for obtaining a highly accurate integer-only recurrent neural network (iRNN)
Our approach supports layer normalization, attention, and an adaptive piecewise linear (PWL) approximation of activation functions.
The proposed method enables RNN-based language models to run on edge devices with $2times$ improvement in runtime.
arXiv Detail & Related papers (2022-12-22T15:22:36Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Low-bit Quantization of Recurrent Neural Network Language Models Using
Alternating Direction Methods of Multipliers [67.688697838109]
This paper presents a novel method to train quantized RNNLMs from scratch using alternating direction methods of multipliers (ADMM)
Experiments on two tasks suggest the proposed ADMM quantization achieved a model size compression factor of up to 31 times over the full precision baseline RNNLMs.
arXiv Detail & Related papers (2021-11-29T09:30:06Z) - A quantum algorithm for training wide and deep classical neural networks [72.2614468437919]
We show that conditions amenable to classical trainability via gradient descent coincide with those necessary for efficiently solving quantum linear systems.
We numerically demonstrate that the MNIST image dataset satisfies such conditions.
We provide empirical evidence for $O(log n)$ training of a convolutional neural network with pooling.
arXiv Detail & Related papers (2021-07-19T23:41:03Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - Integer-Only Neural Network Quantization Scheme Based on
Shift-Batch-Normalization [13.82935273026808]
In this paper, an integer-only-quantization scheme is introduced.
This scheme uses shift-based batch normalization and uniform quantization to implement 4-bit integer-only inference.
arXiv Detail & Related papers (2021-05-28T09:28:12Z) - Ps and Qs: Quantization-aware pruning for efficient low latency neural
network inference [56.24109486973292]
We study the interplay between pruning and quantization during the training of neural networks for ultra low latency applications.
We find that quantization-aware pruning yields more computationally efficient models than either pruning or quantization alone for our task.
arXiv Detail & Related papers (2021-02-22T19:00:05Z) - Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data.
In this paper, we present and evaluate different strategies for the binarization of graph neural networks.
We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z) - Fast Implementation of 4-bit Convolutional Neural Networks for Mobile
Devices [0.8362190332905524]
We show an efficient implementation of 4-bit matrix multiplication for quantized neural networks.
We also demonstrate a 4-bit quantized neural network for OCR recognition on the MIDV-500 dataset.
The results show that 4-bit quantization perfectly suits mobile devices, yielding good enough accuracy and low inference time.
arXiv Detail & Related papers (2020-09-14T14:48:40Z) - Integer Quantization for Deep Learning Inference: Principles and
Empirical Evaluation [4.638764944415326]
Quantization techniques can reduce the size of Deep Neural Networks and improve inference latency and throughput.
We focus on quantization techniques that are amenable to acceleration by processors with high- throughput integer math pipelines.
We present a workflow for 8-bit quantization that is able to maintain accuracy within 1% of the floating-point baseline on all networks studied.
arXiv Detail & Related papers (2020-04-20T19:59:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.