QONNX: Representing Arbitrary-Precision Quantized Neural Networks
- URL: http://arxiv.org/abs/2206.07527v2
- Date: Fri, 17 Jun 2022 16:34:28 GMT
- Title: QONNX: Representing Arbitrary-Precision Quantized Neural Networks
- Authors: Alessandro Pappalardo and Yaman Umuroglu and Michaela Blott and Jovan
Mitrevski and Ben Hawks and Nhan Tran and Vladimir Loncar and Sioni Summers
and Hendrik Borras and Jules Muhizi and Matthew Trahms and Shih-Chieh Hsu and
Scott Hauck and Javier Duarte
- Abstract summary: We present extensions to the Open Neural Network Exchange (ONNX) intermediate representation format to represent arbitrary-precision quantized neural networks.
We first introduce support for low precision quantization in existing ONNX-based quantization formats by leveraging integer clipping.
We then introduce a novel higher-level ONNX format called quantized ONNX (QONNX) that introduces three new operators -- Quant, BipolarQuant, and Trunc.
- Score: 49.10245225120615
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present extensions to the Open Neural Network Exchange (ONNX) intermediate
representation format to represent arbitrary-precision quantized neural
networks. We first introduce support for low precision quantization in existing
ONNX-based quantization formats by leveraging integer clipping, resulting in
two new backward-compatible variants: the quantized operator format with
clipping and quantize-clip-dequantize (QCDQ) format. We then introduce a novel
higher-level ONNX format called quantized ONNX (QONNX) that introduces three
new operators -- Quant, BipolarQuant, and Trunc -- in order to represent
uniform quantization. By keeping the QONNX IR high-level and flexible, we
enable targeting a wider variety of platforms. We also present utilities for
working with QONNX, as well as examples of its usage in the FINN and hls4ml
toolchains. Finally, we introduce the QONNX model zoo to share low-precision
quantized neural networks.
Related papers
- Frame Quantization of Neural Networks [2.8720213314158234]
We present a post-training quantization algorithm with error estimates relying on ideas originating from frame theory.
We derive an error bound between the original neural network and the quantized neural network in terms of step size and the number of frame elements.
arXiv Detail & Related papers (2024-04-11T21:24:38Z) - CEG4N: Counter-Example Guided Neural Network Quantization Refinement [2.722899166098862]
We propose Counter-Example Guided Neural Network Quantization Refinement (CEG4N)
This technique combines search-based quantization and equivalence verification.
We produce models with up to 72% better accuracy than state-of-the-art techniques.
arXiv Detail & Related papers (2022-07-09T09:25:45Z) - Low-bit Quantization of Recurrent Neural Network Language Models Using
Alternating Direction Methods of Multipliers [67.688697838109]
This paper presents a novel method to train quantized RNNLMs from scratch using alternating direction methods of multipliers (ADMM)
Experiments on two tasks suggest the proposed ADMM quantization achieved a model size compression factor of up to 31 times over the full precision baseline RNNLMs.
arXiv Detail & Related papers (2021-11-29T09:30:06Z) - QTN-VQC: An End-to-End Learning framework for Quantum Neural Networks [71.14713348443465]
We introduce a trainable quantum tensor network (QTN) for quantum embedding on a variational quantum circuit (VQC)
QTN enables an end-to-end parametric model pipeline, namely QTN-VQC, from the generation of quantum embedding to the output measurement.
Our experiments on the MNIST dataset demonstrate the advantages of QTN for quantum embedding over other quantum embedding approaches.
arXiv Detail & Related papers (2021-10-06T14:44:51Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - Quantum Deep Learning: Sampling Neural Nets with a Quantum Annealer [0.0]
We propose approaches to overcome two hurdles for high resolution image classification on a quantum processing unit.
We successfully transfer a convolutional neural network to the QPU and show the potential for classification speedup of at least one order of magnitude.
arXiv Detail & Related papers (2021-07-19T09:35:02Z) - A Quantum Convolutional Neural Network on NISQ Devices [0.9831489366502298]
We propose a quantum convolutional neural network inspired by convolutional neural networks.
Our model is robust to certain noise for image recognition tasks.
It opens up the prospect of exploiting quantum power to process information in the era of big data.
arXiv Detail & Related papers (2021-04-14T15:07:03Z) - Toward Trainability of Quantum Neural Networks [87.04438831673063]
Quantum Neural Networks (QNNs) have been proposed as generalizations of classical neural networks to achieve the quantum speed-up.
Serious bottlenecks exist for training QNNs due to the vanishing with gradient rate exponential to the input qubit number.
We show that QNNs with tree tensor and step controlled structures for the application of binary classification. Simulations show faster convergent rates and better accuracy compared to QNNs with random structures.
arXiv Detail & Related papers (2020-11-12T08:32:04Z) - Compiling ONNX Neural Network Models Using MLIR [51.903932262028235]
We present a preliminary report on our onnx-mlir compiler, which generates code for the inference of deep neural network models.
Onnx-mlir relies on the Multi-Level Intermediate Representation (MLIR) infrastructure recently integrated in the LLVM project.
arXiv Detail & Related papers (2020-08-19T05:28:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.