Gradient-Based Post-Training Quantization: Challenging the Status Quo
- URL: http://arxiv.org/abs/2308.07662v1
- Date: Tue, 15 Aug 2023 09:25:11 GMT
- Title: Gradient-Based Post-Training Quantization: Challenging the Status Quo
- Authors: Edouard Yvinec, Arnaud Dapogny and Kevin Bailly
- Abstract summary: Quantization has become a crucial step for the efficient deployment of deep neural networks.
In this work, we show that the process is, to a certain extent, robust to a number of variables.
We derive a number of best practices for designing more efficient and scalable GPTQ methods.
- Score: 23.1120983784623
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Quantization has become a crucial step for the efficient deployment of deep
neural networks, where floating point operations are converted to simpler fixed
point operations. In its most naive form, it simply consists in a combination
of scaling and rounding transformations, leading to either a limited
compression rate or a significant accuracy drop. Recently, Gradient-based
post-training quantization (GPTQ) methods appears to be constitute a suitable
trade-off between such simple methods and more powerful, yet expensive
Quantization-Aware Training (QAT) approaches, particularly when attempting to
quantize LLMs, where scalability of the quantization process is of paramount
importance. GPTQ essentially consists in learning the rounding operation using
a small calibration set. In this work, we challenge common choices in GPTQ
methods. In particular, we show that the process is, to a certain extent,
robust to a number of variables (weight selection, feature augmentation, choice
of calibration set). More importantly, we derive a number of best practices for
designing more efficient and scalable GPTQ methods, regarding the problem
formulation (loss, degrees of freedom, use of non-uniform quantization schemes)
or optimization process (choice of variable and optimizer). Lastly, we propose
a novel importance-based mixed-precision technique. Those guidelines lead to
significant performance improvements on all the tested state-of-the-art GPTQ
methods and networks (e.g. +6.819 points on ViT for 4-bit quantization), paving
the way for the design of scalable, yet effective quantization methods.
Related papers
- Gradient-based Automatic Mixed Precision Quantization for Neural Networks On-Chip [0.9187138676564589]
We present High Granularity Quantization (HGQ), an innovative quantization-aware training method.
HGQ fine-tune the per-weight and per-activation precision by making them optimizable through gradient descent.
This approach enables ultra-low latency and low power neural networks on hardware capable of performing arithmetic operations.
arXiv Detail & Related papers (2024-05-01T17:18:46Z) - EPTQ: Enhanced Post-Training Quantization via Hessian-guided Network-wise Optimization [3.3998740964877463]
Quantization is a key method for deploying deep neural networks on edge devices with limited memory and computation resources.
We propose a new method for enhanced Post-Training Quantization (EPTQ) that employs a network-wise quantization optimization process.
arXiv Detail & Related papers (2023-09-20T10:50:28Z) - Norm Tweaking: High-performance Low-bit Quantization of Large Language
Models [21.855106896725598]
We introduce a technique called norm tweaking, which can be used as a plugin in current PTQ methods to achieve high precision.
Our method demonstrates significant improvements in both weight-only quantization and joint quantization of weights and activations.
Our simple and effective approach makes it more practical for real-world applications.
arXiv Detail & Related papers (2023-09-06T06:51:15Z) - PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language
Models [52.09865918265002]
We propose a novel quantize before fine-tuning'' framework, PreQuant.
PreQuant is compatible with various quantization strategies, with outlier-aware fine-tuning incorporated to correct the induced quantization error.
We demonstrate the effectiveness of PreQuant on the GLUE benchmark using BERT, RoBERTa, and T5.
arXiv Detail & Related papers (2023-05-30T08:41:33Z) - Benchmarking the Reliability of Post-training Quantization: a Particular
Focus on Worst-case Performance [53.45700148820669]
Post-training quantization (PTQ) is a popular method for compressing deep neural networks (DNNs) without modifying their original architecture or training procedures.
Despite its effectiveness and convenience, the reliability of PTQ methods in the presence of some extrem cases such as distribution shift and data noise remains largely unexplored.
This paper first investigates this problem on various commonly-used PTQ methods.
arXiv Detail & Related papers (2023-03-23T02:55:50Z) - End-to-end resource analysis for quantum interior point methods and portfolio optimization [63.4863637315163]
We provide a complete quantum circuit-level description of the algorithm from problem input to problem output.
We report the number of logical qubits and the quantity/depth of non-Clifford T-gates needed to run the algorithm.
arXiv Detail & Related papers (2022-11-22T18:54:48Z) - Gradient-descent quantum process tomography by learning Kraus operators [63.69764116066747]
We perform quantum process tomography (QPT) for both discrete- and continuous-variable quantum systems.
We use a constrained gradient-descent (GD) approach on the so-called Stiefel manifold during optimization to obtain the Kraus operators.
The GD-QPT matches the performance of both compressed-sensing (CS) and projected least-squares (PLS) QPT in benchmarks with two-qubit random processes.
arXiv Detail & Related papers (2022-08-01T12:48:48Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - FLIP: A flexible initializer for arbitrarily-sized parametrized quantum
circuits [105.54048699217668]
We propose a FLexible Initializer for arbitrarily-sized Parametrized quantum circuits.
FLIP can be applied to any family of PQCs, and instead of relying on a generic set of initial parameters, it is tailored to learn the structure of successful parameters.
We illustrate the advantage of using FLIP in three scenarios: a family of problems with proven barren plateaus, PQC training to solve max-cut problem instances, and PQC training for finding the ground state energies of 1D Fermi-Hubbard models.
arXiv Detail & Related papers (2021-03-15T17:38:33Z) - Post-training Quantization with Multiple Points: Mixed Precision without
Mixed Precision [20.081543082708688]
We propose multipoint quantization, a method that approximates a full-precision weight vector using a linear combination of multiple vectors of low-bit numbers.
We show that our method outperforms a range of state-of-the-art methods on ImageNet classification and it can be generalized to more challenging tasks like PASCAL VOC object detection.
arXiv Detail & Related papers (2020-02-20T22:37:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.