Genie: Show Me the Data for Quantization
- URL: http://arxiv.org/abs/2212.04780v3
- Date: Tue, 8 Aug 2023 14:30:05 GMT
- Title: Genie: Show Me the Data for Quantization
- Authors: Yongkweon Jeon, Chungman Lee, Ho-young Kim
- Abstract summary: We introduce a post-training quantization scheme for zero-shot quantization that produces high-quality quantized networks within a few hours.
We also propose a post-training quantization algorithm to enhance the performance of quantized models.
- Score: 2.7286395031146062
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Zero-shot quantization is a promising approach for developing lightweight
deep neural networks when data is inaccessible owing to various reasons,
including cost and issues related to privacy. By exploiting the learned
parameters ($\mu$ and $\sigma$) of batch normalization layers in an
FP32-pre-trained model, zero-shot quantization schemes focus on generating
synthetic data. Subsequently, they distill knowledge from the pre-trained model
(teacher) to the quantized model (student) such that the quantized model can be
optimized with the synthetic dataset. However, thus far, zero-shot quantization
has primarily been discussed in the context of quantization-aware training
methods, which require task-specific losses and long-term optimization as much
as retraining. We thus introduce a post-training quantization scheme for
zero-shot quantization that produces high-quality quantized networks within a
few hours. Furthermore, we propose a framework called Genie~that generates data
suited for quantization. With the data synthesized by Genie, we can produce
robust quantized models without real datasets, which is comparable to few-shot
quantization. We also propose a post-training quantization algorithm to enhance
the performance of quantized models. By combining them, we can bridge the gap
between zero-shot and few-shot quantization while significantly improving the
quantization performance compared to that of existing approaches. In other
words, we can obtain a unique state-of-the-art zero-shot quantization approach.
The code is available at \url{https://github.com/SamsungLabs/Genie}.
Related papers
- ISQuant: apply squant to the real deployment [0.0]
We analyze why the combination of quantization and dequantization is used to train the model.
We propose ISQuant as a solution for deploying 8-bit models.
arXiv Detail & Related papers (2024-07-05T15:10:05Z) - MixQuant: Mixed Precision Quantization with a Bit-width Optimization
Search [7.564770908909927]
Quantization is a technique for creating efficient Deep Neural Networks (DNNs)
We propose MixQuant, a search algorithm that finds the optimal custom quantization bit-width for each layer weight based on roundoff error.
We show that combining MixQuant with BRECQ, a state-of-the-art quantization method, yields better quantized model accuracy than BRECQ alone.
arXiv Detail & Related papers (2023-09-29T15:49:54Z) - PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language
Models [52.09865918265002]
We propose a novel quantize before fine-tuning'' framework, PreQuant.
PreQuant is compatible with various quantization strategies, with outlier-aware fine-tuning incorporated to correct the induced quantization error.
We demonstrate the effectiveness of PreQuant on the GLUE benchmark using BERT, RoBERTa, and T5.
arXiv Detail & Related papers (2023-05-30T08:41:33Z) - A didactic approach to quantum machine learning with a single qubit [68.8204255655161]
We focus on the case of learning with a single qubit, using data re-uploading techniques.
We implement the different proposed formulations in toy and real-world datasets using the qiskit quantum computing SDK.
arXiv Detail & Related papers (2022-11-23T18:25:32Z) - MQBench: Towards Reproducible and Deployable Model Quantization
Benchmark [53.12623958951738]
MQBench is a first attempt to evaluate, analyze, and benchmark the and deployability for model quantization algorithms.
We choose multiple platforms for real-world deployments, including CPU, GPU, ASIC, DSP, and evaluate extensive state-of-the-art quantization algorithms.
We conduct a comprehensive analysis and find considerable intuitive or counter-intuitive insights.
arXiv Detail & Related papers (2021-11-05T23:38:44Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - In-Hindsight Quantization Range Estimation for Quantized Training [5.65658124285176]
We propose a simple alternative to dynamic quantization, in-hindsight range estimation, that uses the quantization ranges estimated on previous iterations to quantize the present.
Our approach enables fast static quantization of gradients and activations while requiring only minimal hardware support from the neural network accelerator.
It is intended as a drop-in replacement for estimating quantization ranges and can be used in conjunction with other advances in quantized training.
arXiv Detail & Related papers (2021-05-10T10:25:28Z) - One Model for All Quantization: A Quantized Network Supporting Hot-Swap
Bit-Width Adjustment [36.75157407486302]
We propose a method to train a model for all quantization that supports diverse bit-widths.
We use wavelet decomposition and reconstruction to increase the diversity of weights.
Our method can achieve accuracy comparable to dedicated models trained at the same precision.
arXiv Detail & Related papers (2021-05-04T08:10:50Z) - Zero-shot Adversarial Quantization [11.722728148523366]
We propose a zero-shot adversarial quantization (ZAQ) framework, facilitating effective discrepancy estimation and knowledge transfer.
This is achieved by a novel two-level discrepancy modeling to drive a generator to synthesize informative and diverse data examples.
We conduct extensive experiments on three fundamental vision tasks, demonstrating the superiority of ZAQ over the strong zero-shot baselines.
arXiv Detail & Related papers (2021-03-29T01:33:34Z) - Once Quantization-Aware Training: High Performance Extremely Low-bit
Architecture Search [112.05977301976613]
We propose to combine Network Architecture Search methods with quantization to enjoy the merits of the two sides.
We first propose the joint training of architecture and quantization with a shared step size to acquire a large number of quantized models.
Then a bit-inheritance scheme is introduced to transfer the quantized models to the lower bit, which further reduces the time cost and improves the quantization accuracy.
arXiv Detail & Related papers (2020-10-09T03:52:16Z) - ZeroQ: A Novel Zero Shot Quantization Framework [83.63606876854168]
Quantization is a promising approach for reducing the inference time and memory footprint of neural networks.
Existing zero-shot quantization methods use different epochs to address this, but they result in poor performance.
Here, we propose ZeroQ, a novel zero-shot quantization framework to address this.
arXiv Detail & Related papers (2020-01-01T23:58:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.