Related papers: HPTQ: Hardware-Friendly Post Training Quantization

HPTQ: Hardware-Friendly Post Training Quantization

URL: http://arxiv.org/abs/2109.09113v1
Date: Sun, 19 Sep 2021 12:45:01 GMT
Title: HPTQ: Hardware-Friendly Post Training Quantization
Authors: Hai Victor Habi, Reuven Peretz, Elad Cohen, Lior Dikstein, Oranit Dror, Idit Diamant, Roy H. Jennings and Arnon Netzer
Abstract summary: We introduce a hardware-friendly post training quantization (HPTQ) framework. We perform a large-scale study on four tasks: classification, object detection, semantic segmentation and pose estimation. Our experiments show that competitive results can be obtained under hardware-friendly constraints.
Score: 6.515659231669797
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Neural network quantization enables the deployment of models on edge devices. An essential requirement for their hardware efficiency is that the quantizers are hardware-friendly: uniform, symmetric, and with power-of-two thresholds. To the best of our knowledge, current post-training quantization methods do not support all of these constraints simultaneously. In this work, we introduce a hardware-friendly post training quantization (HPTQ) framework, which addresses this problem by synergistically combining several known quantization methods. We perform a large-scale study on four tasks: classification, object detection, semantic segmentation and pose estimation over a wide variety of network architectures. Our extensive experiments show that competitive results can be obtained under hardware-friendly constraints.

Related papers

An Efficient Quantum Classifier Based on Hamiltonian Representations [50.467930253994155]
Quantum machine learning (QML) is a discipline that seeks to transfer the advantages of quantum computing to data-driven tasks. We propose an efficient approach that circumvents the costs associated with data encoding by mapping inputs to a finite set of Pauli strings. We evaluate our approach on text and image classification tasks, against well-established classical and quantum models.
arXiv Detail & Related papers (2025-04-13T11:49:53Z)
Enhancing variational quantum algorithms by balancing training on classical and quantum hardware [1.8377902806196762]
Variational quantum algorithms (VQAs) have the potential to provide a near-term route to quantum utility or advantage. VQAs have been proposed for a multitude of tasks such as ground-state estimation. There remain major challenges in its trainability and resource costs on quantum hardware.
arXiv Detail & Related papers (2025-03-20T17:17:58Z)
A Quantum-Classical Collaborative Training Architecture Based on Quantum State Fidelity [50.387179833629254]
We introduce a collaborative classical-quantum architecture called co-TenQu. Co-TenQu enhances a classical deep neural network by up to 41.72% in a fair setting. It outperforms other quantum-based methods by up to 1.9 times and achieves similar accuracy while utilizing 70.59% fewer qubits.
arXiv Detail & Related papers (2024-02-23T14:09:41Z)
RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization [8.827794405944637]
Post-training quantization (PTQ) is a promising solution for compressing large transformer models. Existing PTQ methods typically exhibit non-trivial performance loss. We propose RepQuant, a novel PTQ framework with quantization-inference decoupling paradigm.
arXiv Detail & Related papers (2024-02-08T12:35:41Z)
Resource Saving via Ensemble Techniques for Quantum Neural Networks [1.4606049539095878]
We propose the use of ensemble techniques, which involve constructing a single machine learning model based on multiple instances of quantum neural networks. In particular, we implement bagging and AdaBoost techniques, with different data loading configurations, and evaluate their performance on both synthetic and real-world classification and regression tasks. Our findings indicate that these methods enable the construction of large, powerful models even on relatively small quantum devices.
arXiv Detail & Related papers (2023-03-20T17:19:45Z)
TeD-Q: a tensor network enhanced distributed hybrid quantum machine learning framework [59.07246314484875]
TeD-Q is an open-source software framework for quantum machine learning. It seamlessly integrates classical machine learning libraries with quantum simulators. It provides a graphical mode in which the quantum circuit and the training progress can be visualized in real-time.
arXiv Detail & Related papers (2023-01-13T09:35:05Z)
MQBench: Towards Reproducible and Deployable Model Quantization Benchmark [53.12623958951738]
MQBench is a first attempt to evaluate, analyze, and benchmark the and deployability for model quantization algorithms. We choose multiple platforms for real-world deployments, including CPU, GPU, ASIC, DSP, and evaluate extensive state-of-the-art quantization algorithms. We conduct a comprehensive analysis and find considerable intuitive or counter-intuitive insights.
arXiv Detail & Related papers (2021-11-05T23:38:44Z)
Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks. DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons. We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z)
A White Paper on Neural Network Quantization [20.542729144379223]
We introduce state-of-the-art algorithms for mitigating the impact of quantization noise on the network's performance. We consider two main classes of algorithms: Post-Training Quantization (PTQ) and Quantization-Aware-Training (QAT)
arXiv Detail & Related papers (2021-06-15T17:12:42Z)
Training Multi-bit Quantized and Binarized Networks with A Learnable Symmetric Quantizer [1.9659095632676098]
Quantizing weights and activations of deep neural networks is essential for deploying them in resource-constrained devices or cloud platforms. While binarization is a special case of quantization, this extreme case often leads to several training difficulties. We develop a unified quantization framework, denoted as UniQ, to overcome binarization difficulties.
arXiv Detail & Related papers (2021-04-01T02:33:31Z)
Once Quantization-Aware Training: High Performance Extremely Low-bit Architecture Search [112.05977301976613]
We propose to combine Network Architecture Search methods with quantization to enjoy the merits of the two sides. We first propose the joint training of architecture and quantization with a shared step size to acquire a large number of quantized models. Then a bit-inheritance scheme is introduced to transfer the quantized models to the lower bit, which further reduces the time cost and improves the quantization accuracy.
arXiv Detail & Related papers (2020-10-09T03:52:16Z)
HMQ: Hardware Friendly Mixed Precision Quantization Block for CNNs [7.219077740523684]
We introduce the Hardware Friendly Mixed Precision Quantization Block (HMQ) HMQ is a mixed precision quantization block that repurposes the Gumbel-Softmax estimator into a smooth estimator of a pair of quantization parameters. We apply HMQs to quantize classification models trained on CIFAR10 and ImageNet.
arXiv Detail & Related papers (2020-07-20T09:02:09Z)
Entanglement Classification via Neural Network Quantum States [58.720142291102135]
In this paper we combine machine-learning tools and the theory of quantum entanglement to perform entanglement classification for multipartite qubit systems in pure states. We use a parameterisation of quantum systems using artificial neural networks in a restricted Boltzmann machine (RBM) architecture, known as Neural Network Quantum States (NNS)
arXiv Detail & Related papers (2019-12-31T07:40:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.