HPTQ: Hardware-Friendly Post Training Quantization
- URL: http://arxiv.org/abs/2109.09113v1
- Date: Sun, 19 Sep 2021 12:45:01 GMT
- Title: HPTQ: Hardware-Friendly Post Training Quantization
- Authors: Hai Victor Habi, Reuven Peretz, Elad Cohen, Lior Dikstein, Oranit
Dror, Idit Diamant, Roy H. Jennings and Arnon Netzer
- Abstract summary: We introduce a hardware-friendly post training quantization (HPTQ) framework.
We perform a large-scale study on four tasks: classification, object detection, semantic segmentation and pose estimation.
Our experiments show that competitive results can be obtained under hardware-friendly constraints.
- Score: 6.515659231669797
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural network quantization enables the deployment of models on edge devices.
An essential requirement for their hardware efficiency is that the quantizers
are hardware-friendly: uniform, symmetric, and with power-of-two thresholds. To
the best of our knowledge, current post-training quantization methods do not
support all of these constraints simultaneously. In this work, we introduce a
hardware-friendly post training quantization (HPTQ) framework, which addresses
this problem by synergistically combining several known quantization methods.
We perform a large-scale study on four tasks: classification, object detection,
semantic segmentation and pose estimation over a wide variety of network
architectures. Our extensive experiments show that competitive results can be
obtained under hardware-friendly constraints.
Related papers
- A Quantum-Classical Collaborative Training Architecture Based on Quantum
State Fidelity [50.387179833629254]
We introduce a collaborative classical-quantum architecture called co-TenQu.
Co-TenQu enhances a classical deep neural network by up to 41.72% in a fair setting.
It outperforms other quantum-based methods by up to 1.9 times and achieves similar accuracy while utilizing 70.59% fewer qubits.
arXiv Detail & Related papers (2024-02-23T14:09:41Z) - RepQuant: Towards Accurate Post-Training Quantization of Large
Transformer Models via Scale Reparameterization [8.827794405944637]
Post-training quantization (PTQ) is a promising solution for compressing large transformer models.
Existing PTQ methods typically exhibit non-trivial performance loss.
We propose RepQuant, a novel PTQ framework with quantization-inference decoupling paradigm.
arXiv Detail & Related papers (2024-02-08T12:35:41Z) - Resource Saving via Ensemble Techniques for Quantum Neural Networks [1.4606049539095878]
We propose the use of ensemble techniques, which involve constructing a single machine learning model based on multiple instances of quantum neural networks.
In particular, we implement bagging and AdaBoost techniques, with different data loading configurations, and evaluate their performance on both synthetic and real-world classification and regression tasks.
Our findings indicate that these methods enable the construction of large, powerful models even on relatively small quantum devices.
arXiv Detail & Related papers (2023-03-20T17:19:45Z) - TeD-Q: a tensor network enhanced distributed hybrid quantum machine
learning framework [59.07246314484875]
TeD-Q is an open-source software framework for quantum machine learning.
It seamlessly integrates classical machine learning libraries with quantum simulators.
It provides a graphical mode in which the quantum circuit and the training progress can be visualized in real-time.
arXiv Detail & Related papers (2023-01-13T09:35:05Z) - MQBench: Towards Reproducible and Deployable Model Quantization
Benchmark [53.12623958951738]
MQBench is a first attempt to evaluate, analyze, and benchmark the and deployability for model quantization algorithms.
We choose multiple platforms for real-world deployments, including CPU, GPU, ASIC, DSP, and evaluate extensive state-of-the-art quantization algorithms.
We conduct a comprehensive analysis and find considerable intuitive or counter-intuitive insights.
arXiv Detail & Related papers (2021-11-05T23:38:44Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - A White Paper on Neural Network Quantization [20.542729144379223]
We introduce state-of-the-art algorithms for mitigating the impact of quantization noise on the network's performance.
We consider two main classes of algorithms: Post-Training Quantization (PTQ) and Quantization-Aware-Training (QAT)
arXiv Detail & Related papers (2021-06-15T17:12:42Z) - Training Multi-bit Quantized and Binarized Networks with A Learnable
Symmetric Quantizer [1.9659095632676098]
Quantizing weights and activations of deep neural networks is essential for deploying them in resource-constrained devices or cloud platforms.
While binarization is a special case of quantization, this extreme case often leads to several training difficulties.
We develop a unified quantization framework, denoted as UniQ, to overcome binarization difficulties.
arXiv Detail & Related papers (2021-04-01T02:33:31Z) - Once Quantization-Aware Training: High Performance Extremely Low-bit
Architecture Search [112.05977301976613]
We propose to combine Network Architecture Search methods with quantization to enjoy the merits of the two sides.
We first propose the joint training of architecture and quantization with a shared step size to acquire a large number of quantized models.
Then a bit-inheritance scheme is introduced to transfer the quantized models to the lower bit, which further reduces the time cost and improves the quantization accuracy.
arXiv Detail & Related papers (2020-10-09T03:52:16Z) - HMQ: Hardware Friendly Mixed Precision Quantization Block for CNNs [7.219077740523684]
We introduce the Hardware Friendly Mixed Precision Quantization Block (HMQ)
HMQ is a mixed precision quantization block that repurposes the Gumbel-Softmax estimator into a smooth estimator of a pair of quantization parameters.
We apply HMQs to quantize classification models trained on CIFAR10 and ImageNet.
arXiv Detail & Related papers (2020-07-20T09:02:09Z) - Entanglement Classification via Neural Network Quantum States [58.720142291102135]
In this paper we combine machine-learning tools and the theory of quantum entanglement to perform entanglement classification for multipartite qubit systems in pure states.
We use a parameterisation of quantum systems using artificial neural networks in a restricted Boltzmann machine (RBM) architecture, known as Neural Network Quantum States (NNS)
arXiv Detail & Related papers (2019-12-31T07:40:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.