Towards Accurate Post-Training Quantization of Vision Transformers via Error Reduction
- URL: http://arxiv.org/abs/2407.06794v2
- Date: Tue, 04 Feb 2025 04:50:20 GMT
- Title: Towards Accurate Post-Training Quantization of Vision Transformers via Error Reduction
- Authors: Yunshan Zhong, You Huang, Jiawei Hu, Yuxin Zhang, Rongrong Ji,
- Abstract summary: Post-training quantization (PTQ) for vision transformers (ViTs) has received increasing attention from both academic and industrial communities.
Current methods fail to account for the complex interactions between quantized weights and activations, resulting in significant quantization errors and suboptimal performance.
This paper presents ERQ, an innovative two-step PTQ method specifically crafted to reduce quantization errors arising from activation and weight quantization sequentially.
- Score: 48.740630807085566
- License:
- Abstract: Post-training quantization (PTQ) for vision transformers (ViTs) has received increasing attention from both academic and industrial communities due to its minimal data needs and high time efficiency. However, many current methods fail to account for the complex interactions between quantized weights and activations, resulting in significant quantization errors and suboptimal performance. This paper presents ERQ, an innovative two-step PTQ method specifically crafted to reduce quantization errors arising from activation and weight quantization sequentially. The first step, Activation quantization error reduction (Aqer), first applies Reparameterization Initialization aimed at mitigating initial quantization errors in high-variance activations. Then, it further mitigates the errors by formulating a Ridge Regression problem, which updates the weights maintained at full-precision using a closed-form solution. The second step, Weight quantization error reduction (Wqer), first applies Dual Uniform Quantization to handle weights with numerous outliers, which arise from adjustments made during Reparameterization Initialization, thereby reducing initial weight quantization errors. Then, it employs an iterative approach to further tackle the errors. In each iteration, it adopts Rounding Refinement that uses an empirically derived, efficient proxy to refine the rounding directions of quantized weights, complemented by a Ridge Regression solver to reduce the errors. Comprehensive experimental results demonstrate ERQ's superior performance across various ViTs variants and tasks. For example, ERQ surpasses the state-of-the-art GPTQ by a notable 36.81% in accuracy for W3A4 ViT-S. Our codes are available at https://github.com/zysxmu/ERQ.
Related papers
- PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models [64.84734437930362]
Large Language Models (LLMs) suffer severe performance degradation when facing extremely low-bit (sub 2-bit) quantization.
We propose an extremely low-bit PTQ method called PTQ1.61, which enables weight quantization to 1.61-bit for the first time.
Experiments indicate our PTQ1.61 achieves state-of-the-art performance in extremely low-bit quantization.
arXiv Detail & Related papers (2025-02-18T08:04:58Z) - RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models [95.32315448601241]
We propose an algorithm named Rotated Straight-Through-Estimator (RoSTE)
RoSTE combines quantization-aware supervised fine-tuning (QA-SFT) with an adaptive rotation strategy to reduce activation outliers.
Our findings reveal that the prediction error is directly proportional to the quantization error of the converged weights, which can be effectively managed through an optimized rotation configuration.
arXiv Detail & Related papers (2025-02-13T06:44:33Z) - GWQ: Gradient-Aware Weight Quantization for Large Language Models [63.89099994367657]
Large language models (LLMs) show impressive performance in solving complex language tasks.
LLMs to low bits can enable them to run on resource-constrained devices, often leading to performance degradation.
We propose gradient-aware weight quantization (GWQ), the first quantization approach for low-bit weight quantization.
arXiv Detail & Related papers (2024-10-30T11:16:04Z) - QERA: an Analytical Framework for Quantization Error Reconstruction [12.110441045050223]
An increasing interest in quantizing weights to extremely low precision while offsetting the resulting error with low-rank, high-precision error reconstruction terms.
The combination of quantization and low-rank approximation is now popular in both adapter-based, parameter-efficient fine-tuning methods.
We formulate an analytical framework, named Quantization Error Reconstruction Analysis (QERA), and offer a closed-form solution to the problem.
arXiv Detail & Related papers (2024-10-08T13:37:34Z) - OAC: Output-adaptive Calibration for Accurate Post-training Quantization [30.115888331426515]
Post-training Quantization (PTQ) techniques have been developed to compress Large Language Models (LLMs)
Most PTQ approaches formulate the quantization error based on a calibrated layer-wise $ell$ loss.
We propose Output-adaptive (OAC) to incorporate the model output in the calibration process.
arXiv Detail & Related papers (2024-05-23T20:01:17Z) - A2Q+: Improving Accumulator-Aware Weight Quantization [45.14832807541816]
Quantization techniques commonly reduce the inference costs of neural networks by restricting the precision of weights and activations.
Recent work proposed accumulator-aware quantization (A2Q), a quantization-aware training method that constrains model weights during training to safely use a target accumulator bit width during inference.
We introduce A2Q+, a new strategy for initializing quantized weights from pre-trained floating-point checkpoints.
arXiv Detail & Related papers (2024-01-19T00:27:34Z) - Post-Training Quantization for Re-parameterization via Coarse & Fine
Weight Splitting [13.270381125055275]
We propose a coarse & fine weight splitting (CFWS) method to reduce quantization error of weight.
We develop an improved KL metric to determine optimal quantization scales for activation.
For example, the quantized RepVGG-A1 model exhibits a mere 0.3% accuracy loss.
arXiv Detail & Related papers (2023-12-17T02:31:20Z) - Towards Accurate Post-Training Quantization for Vision Transformer [48.779346466374406]
Existing post-training quantization methods still cause severe performance drops.
APQ-ViT surpasses the existing post-training quantization methods by convincing margins.
arXiv Detail & Related papers (2023-03-25T03:05:26Z) - Improving Convergence for Quantum Variational Classifiers using Weight
Re-Mapping [60.086820254217336]
In recent years, quantum machine learning has seen a substantial increase in the use of variational quantum circuits (VQCs)
We introduce weight re-mapping for VQCs, to unambiguously map the weights to an interval of length $2pi$.
We demonstrate that weight re-mapping increased test accuracy for the Wine dataset by $10%$ over using unmodified weights.
arXiv Detail & Related papers (2022-12-22T13:23:19Z) - RepQ-ViT: Scale Reparameterization for Post-Training Quantization of
Vision Transformers [2.114921680609289]
We propose RepQ-ViT, a novel PTQ framework for vision transformers (ViTs)
RepQ-ViT decouples the quantization and inference processes.
It can outperform existing strong baselines and encouragingly improve the accuracy of 4-bit PTQ of ViTs to a usable level.
arXiv Detail & Related papers (2022-12-16T02:52:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.