Transition Rate Scheduling for Quantization-Aware Training
- URL: http://arxiv.org/abs/2404.19248v2
- Date: Fri, 06 Dec 2024 12:02:37 GMT
- Title: Transition Rate Scheduling for Quantization-Aware Training
- Authors: Junghyup Lee, Jeimin Jeon, Dohyung Kim, Bumsub Ham,
- Abstract summary: Quantization-aware training (QAT) simulates a quantization process during training to lower bit-precision of weights/activations.
It learns quantized weights indirectly by updating latent weights, using gradient-baseds.
We introduce a transition rate (TR) scheduling technique that controls the number of transitions of quantized weights explicitly.
- Score: 26.792400685888175
- License:
- Abstract: Quantization-aware training (QAT) simulates a quantization process during training to lower bit-precision of weights/activations. It learns quantized weights indirectly by updating latent weights, i.e., full-precision inputs to a quantizer, using gradient-based optimizers. We claim that coupling a user-defined learning rate (LR) with these optimizers is sub-optimal for QAT. Quantized weights transit discrete levels of a quantizer, only if corresponding latent weights pass transition points, where the quantizer changes discrete states. This suggests that the changes of quantized weights are affected by both the LR for latent weights and their distributions. It is thus difficult to control the degree of changes for quantized weights by scheduling the LR manually. We conjecture that the degree of parameter changes in QAT is related to the number of quantized weights transiting discrete levels. Based on this, we introduce a transition rate (TR) scheduling technique that controls the number of transitions of quantized weights explicitly. Instead of scheduling a LR for latent weights, we schedule a target TR of quantized weights, and update the latent weights with a novel transition-adaptive LR (TALR), enabling considering the degree of changes for the quantized weights during QAT. Experimental results demonstrate the effectiveness of our approach on standard benchmarks.
Related papers
- OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting [20.944120156871108]
Post-training quantization (PTQ) has emerged as a widely adopted technique for compressing and accelerating Large Language Models (LLMs)
The major challenge in LLM quantization is that uneven and heavy-tailed data distributions can expand the quantization range, thereby reducing bit precision for most values.
We introduce Quantization Space Utilization Rate (BrotherQSUR), a novel metric that effectively assesses the quantizability of transformed data by measuring the space utilization of the data in the quantization space.
arXiv Detail & Related papers (2025-01-23T08:24:25Z) - LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices [41.17378536966264]
Low-Rank Quantization (LRQ) reconstructs the outputs of an intermediate Transformer block by leveraging low-rank weight-scaling matrices.
Thanks to parameter sharing via low-rank structure, LRQ only needs to learn significantly fewer parameters while enabling the individual scaling of weights.
We show the superiority of LRQ over prior LLM PTQ works under (i) 8-bit weight and per-tensor activation quantization, (ii) 4-bit weight and 8-bit per-token activation quantization, and (iii) low-bit weight-only quantization schemes.
arXiv Detail & Related papers (2024-07-16T09:32:07Z) - Towards Accurate Post-Training Quantization of Vision Transformers via Error Reduction [48.740630807085566]
Post-training quantization (PTQ) for vision transformers (ViTs) has received increasing attention from both academic and industrial communities.
Current methods fail to account for the complex interactions between quantized weights and activations, resulting in significant quantization errors and suboptimal performance.
This paper presents ERQ, an innovative two-step PTQ method specifically crafted to reduce quantization errors arising from activation and weight quantization sequentially.
arXiv Detail & Related papers (2024-07-09T12:06:03Z) - OAC: Output-adaptive Calibration for Accurate Post-training Quantization [30.115888331426515]
Post-training Quantization (PTQ) techniques have been developed to compress Large Language Models (LLMs)
Most PTQ approaches formulate the quantization error based on a calibrated layer-wise $ell$ loss.
We propose Output-adaptive (OAC) to incorporate the model output in the calibration process.
arXiv Detail & Related papers (2024-05-23T20:01:17Z) - WKVQuant: Quantizing Weight and Key/Value Cache for Large Language
Models Gains More [55.0856305773081]
Large Language Models (LLMs) face significant deployment challenges due to their substantial memory requirements and the computational demands of auto-regressive text generation process.
This paper addresses these challenges by focusing on the quantization of LLMs, a technique that reduces memory consumption by converting model parameters and activations into low-bit integers.
arXiv Detail & Related papers (2024-02-19T11:33:21Z) - RepQuant: Towards Accurate Post-Training Quantization of Large
Transformer Models via Scale Reparameterization [8.827794405944637]
Post-training quantization (PTQ) is a promising solution for compressing large transformer models.
Existing PTQ methods typically exhibit non-trivial performance loss.
We propose RepQuant, a novel PTQ framework with quantization-inference decoupling paradigm.
arXiv Detail & Related papers (2024-02-08T12:35:41Z) - Norm Tweaking: High-performance Low-bit Quantization of Large Language
Models [21.855106896725598]
We introduce a technique called norm tweaking, which can be used as a plugin in current PTQ methods to achieve high precision.
Our method demonstrates significant improvements in both weight-only quantization and joint quantization of weights and activations.
Our simple and effective approach makes it more practical for real-world applications.
arXiv Detail & Related papers (2023-09-06T06:51:15Z) - Improving Convergence for Quantum Variational Classifiers using Weight
Re-Mapping [60.086820254217336]
In recent years, quantum machine learning has seen a substantial increase in the use of variational quantum circuits (VQCs)
We introduce weight re-mapping for VQCs, to unambiguously map the weights to an interval of length $2pi$.
We demonstrate that weight re-mapping increased test accuracy for the Wine dataset by $10%$ over using unmodified weights.
arXiv Detail & Related papers (2022-12-22T13:23:19Z) - Exploiting the Partly Scratch-off Lottery Ticket for Quantization-Aware
Training [69.8539756804198]
A large portion of quantized weights reaches the optimal quantization level after a few training epochs, which we refer to as the partly scratch-off lottery ticket.
We develop a method, dubbed lottery ticket scratcher (LTS), which freezes a weight once the distance between the full-precision one and its quantization level is smaller than a controllable threshold.
arXiv Detail & Related papers (2022-11-12T06:11:36Z) - BiTAT: Neural Network Binarization with Task-dependent Aggregated
Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation.
Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration.
This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z) - Gradient $\ell_1$ Regularization for Quantization Robustness [70.39776106458858]
We derive a simple regularization scheme that improves robustness against post-training quantization.
By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths.
arXiv Detail & Related papers (2020-02-18T12:31:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.