MQBench: Towards Reproducible and Deployable Model Quantization
Benchmark
- URL: http://arxiv.org/abs/2111.03759v1
- Date: Fri, 5 Nov 2021 23:38:44 GMT
- Title: MQBench: Towards Reproducible and Deployable Model Quantization
Benchmark
- Authors: Yuhang Li, Mingzhu Shen, Jian Ma, Yan Ren, Mingxin Zhao, Qi Zhang,
Ruihao Gong, Fengwei Yu, Junjie Yan
- Abstract summary: MQBench is a first attempt to evaluate, analyze, and benchmark the and deployability for model quantization algorithms.
We choose multiple platforms for real-world deployments, including CPU, GPU, ASIC, DSP, and evaluate extensive state-of-the-art quantization algorithms.
We conduct a comprehensive analysis and find considerable intuitive or counter-intuitive insights.
- Score: 53.12623958951738
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Model quantization has emerged as an indispensable technique to accelerate
deep learning inference. While researchers continue to push the frontier of
quantization algorithms, existing quantization work is often unreproducible and
undeployable. This is because researchers do not choose consistent training
pipelines and ignore the requirements for hardware deployments. In this work,
we propose Model Quantization Benchmark (MQBench), a first attempt to evaluate,
analyze, and benchmark the reproducibility and deployability for model
quantization algorithms. We choose multiple different platforms for real-world
deployments, including CPU, GPU, ASIC, DSP, and evaluate extensive
state-of-the-art quantization algorithms under a unified training pipeline.
MQBench acts like a bridge to connect the algorithm and the hardware. We
conduct a comprehensive analysis and find considerable intuitive or
counter-intuitive insights. By aligning the training settings, we find existing
algorithms have about the same performance on the conventional academic track.
While for the hardware-deployable quantization, there is a huge accuracy gap
which remains unsettled. Surprisingly, no existing algorithm wins every
challenge in MQBench, and we hope this work could inspire future research
directions.
Related papers
- ISQuant: apply squant to the real deployment [0.0]
We analyze why the combination of quantization and dequantization is used to train the model.
We propose ISQuant as a solution for deploying 8-bit models.
arXiv Detail & Related papers (2024-07-05T15:10:05Z) - Quantum Subroutine for Variance Estimation: Algorithmic Design and Applications [80.04533958880862]
Quantum computing sets the foundation for new ways of designing algorithms.
New challenges arise concerning which field quantum speedup can be achieved.
Looking for the design of quantum subroutines that are more efficient than their classical counterpart poses solid pillars to new powerful quantum algorithms.
arXiv Detail & Related papers (2024-02-26T09:32:07Z) - Stressing Out Modern Quantum Hardware: Performance Evaluation and
Execution Insights [2.2091590689610823]
Stress testing is a technique used to evaluate a system by giving it a computational load beyond its specified thresholds.
We conduct a qualitative and quantitative evaluation of the Quantinuum H1 ion trap device using a stress test based protocol.
arXiv Detail & Related papers (2024-01-24T20:22:34Z) - Quantum Architecture Search with Unsupervised Representation Learning [24.698519892763283]
Unsupervised representation learning presents new opportunities for advancing Quantum Architecture Search (QAS)
QAS is designed to optimize quantum circuits for Variational Quantum Algorithms (VQAs)
arXiv Detail & Related papers (2024-01-21T19:53:17Z) - Unifying (Quantum) Statistical and Parametrized (Quantum) Algorithms [65.268245109828]
We take inspiration from Kearns' SQ oracle and Valiant's weak evaluation oracle.
We introduce an extensive yet intuitive framework that yields unconditional lower bounds for learning from evaluation queries.
arXiv Detail & Related papers (2023-10-26T18:23:21Z) - Modular Quantization-Aware Training for 6D Object Pose Estimation [52.9436648014338]
Edge applications demand efficient 6D object pose estimation on resource-constrained embedded platforms.
We introduce Modular Quantization-Aware Training (MQAT), an adaptive and mixed-precision quantization-aware training strategy.
MQAT guides a systematic gradated modular quantization sequence and determines module-specific bit precisions, leading to quantized models that outperform those produced by state-of-the-art uniform and mixed-precision quantization techniques.
arXiv Detail & Related papers (2023-03-12T21:01:54Z) - A didactic approach to quantum machine learning with a single qubit [68.8204255655161]
We focus on the case of learning with a single qubit, using data re-uploading techniques.
We implement the different proposed formulations in toy and real-world datasets using the qiskit quantum computing SDK.
arXiv Detail & Related papers (2022-11-23T18:25:32Z) - HPTQ: Hardware-Friendly Post Training Quantization [6.515659231669797]
We introduce a hardware-friendly post training quantization (HPTQ) framework.
We perform a large-scale study on four tasks: classification, object detection, semantic segmentation and pose estimation.
Our experiments show that competitive results can be obtained under hardware-friendly constraints.
arXiv Detail & Related papers (2021-09-19T12:45:01Z) - A White Paper on Neural Network Quantization [20.542729144379223]
We introduce state-of-the-art algorithms for mitigating the impact of quantization noise on the network's performance.
We consider two main classes of algorithms: Post-Training Quantization (PTQ) and Quantization-Aware-Training (QAT)
arXiv Detail & Related papers (2021-06-15T17:12:42Z) - Once Quantization-Aware Training: High Performance Extremely Low-bit
Architecture Search [112.05977301976613]
We propose to combine Network Architecture Search methods with quantization to enjoy the merits of the two sides.
We first propose the joint training of architecture and quantization with a shared step size to acquire a large number of quantized models.
Then a bit-inheritance scheme is introduced to transfer the quantized models to the lower bit, which further reduces the time cost and improves the quantization accuracy.
arXiv Detail & Related papers (2020-10-09T03:52:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.