Related papers: RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers

RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers

URL: http://arxiv.org/abs/2212.08254v2
Date: Mon, 7 Aug 2023 03:00:41 GMT
Title: RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers
Authors: Zhikai Li, Junrui Xiao, Lianwei Yang, and Qingyi Gu
Abstract summary: We propose RepQ-ViT, a novel PTQ framework for vision transformers (ViTs) RepQ-ViT decouples the quantization and inference processes. It can outperform existing strong baselines and encouragingly improve the accuracy of 4-bit PTQ of ViTs to a usable level.
Score: 2.114921680609289
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Post-training quantization (PTQ), which only requires a tiny dataset for calibration without end-to-end retraining, is a light and practical model compression technique. Recently, several PTQ schemes for vision transformers (ViTs) have been presented; unfortunately, they typically suffer from non-trivial accuracy degradation, especially in low-bit cases. In this paper, we propose RepQ-ViT, a novel PTQ framework for ViTs based on quantization scale reparameterization, to address the above issues. RepQ-ViT decouples the quantization and inference processes, where the former employs complex quantizers and the latter employs scale-reparameterized simplified quantizers. This ensures both accurate quantization and efficient inference, which distinguishes it from existing approaches that sacrifice quantization performance to meet the target hardware. More specifically, we focus on two components with extreme distributions: post-LayerNorm activations with severe inter-channel variation and post-Softmax activations with power-law features, and initially apply channel-wise quantization and log$\sqrt{2}$ quantization, respectively. Then, we reparameterize the scales to hardware-friendly layer-wise quantization and log2 quantization for inference, with only slight accuracy or computational costs. Extensive experiments are conducted on multiple vision tasks with different model variants, proving that RepQ-ViT, without hyperparameters and expensive reconstruction procedures, can outperform existing strong baselines and encouragingly improve the accuracy of 4-bit PTQ of ViTs to a usable level. Code is available at https://github.com/zkkli/RepQ-ViT.

Related papers

APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers [71.2294205496784]
We propose textbfAPHQ-ViT, a novel PTQ approach based on importance estimation with Average Perturbation Hessian (APH) We show that APHQ-ViT using linear quantizers outperforms existing PTQ methods by substantial margins in 3-bit and 4-bit across different vision tasks.
arXiv Detail & Related papers (2025-04-03T11:48:56Z)
AIQViT: Architecture-Informed Post-Training Quantization for Vision Transformers [42.535119270045605]
Post-training quantization (PTQ) has emerged as a promising solution for reducing the storage and computational cost of vision transformers (ViTs) This paper proposes an innovative PTQ method tailored for ViTs, termed AIQViT (Architecture-Informed Post-training Quantization for ViTs)
arXiv Detail & Related papers (2025-02-07T03:04:50Z)
PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution [87.89013794655207]
Diffusion-based image super-resolution (SR) models have shown superior performance at the cost of multiple denoising steps. We propose a novel post-training quantization approach with adaptive scale in one-step diffusion (OSD) image SR, PassionSR. Our PassionSR achieves significant advantages over recent leading low-bit quantization methods for image SR.
arXiv Detail & Related papers (2024-11-26T04:49:42Z)
DopQ-ViT: Towards Distribution-Friendly and Outlier-Aware Post-Training Quantization for Vision Transformers [2.0862654518798034]
We propose a Distribution-Friendly and Outlier-Aware Post-training Quantization method for Vision Transformers. DopQ-ViT analyzes the inefficiencies of current quantizers and introduces a distribution-friendly Tan Quantizer called TanQ. DopQ-ViT has been extensively validated and significantly improves the performance of quantization models.
arXiv Detail & Related papers (2024-08-06T16:40:04Z)
AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer [54.713778961605115]
Vision Transformer (ViT) has become one of the most prevailing fundamental backbone networks in the computer vision community. We propose a novel non-uniform quantizer, dubbed the Adaptive Logarithm AdaLog (AdaLog) quantizer.
arXiv Detail & Related papers (2024-07-17T18:38:48Z)
Towards Accurate Post-Training Quantization of Vision Transformers via Error Reduction [48.740630807085566]
Post-training quantization (PTQ) for vision transformers (ViTs) has received increasing attention from both academic and industrial communities. Current methods fail to account for the complex interactions between quantized weights and activations, resulting in significant quantization errors and suboptimal performance. This paper presents ERQ, an innovative two-step PTQ method specifically crafted to reduce quantization errors arising from activation and weight quantization sequentially.
arXiv Detail & Related papers (2024-07-09T12:06:03Z)
ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers [7.155242379236052]
Quantization of Vision Transformers (ViTs) has emerged as a promising solution to mitigate these challenges. Existing methods still suffer from significant accuracy loss at low-bit. ADFQ-ViT provides significant improvements over various baselines in image classification, object detection, and instance segmentation tasks at 4-bit.
arXiv Detail & Related papers (2024-07-03T02:41:59Z)
RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization [8.827794405944637]
Post-training quantization (PTQ) is a promising solution for compressing large transformer models. Existing PTQ methods typically exhibit non-trivial performance loss. We propose RepQuant, a novel PTQ framework with quantization-inference decoupling paradigm.
arXiv Detail & Related papers (2024-02-08T12:35:41Z)
MPTQ-ViT: Mixed-Precision Post-Training Quantization for Vision Transformer [7.041718444626999]
We propose a mixed-precision post-training quantization framework for vision transformers (MPTQ-ViT) Our experiments on ViT, DeiT, and Swin demonstrate significant accuracy improvements compared with SOTA on the ImageNet dataset.
arXiv Detail & Related papers (2024-01-26T14:25:15Z)
I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization [49.17407185195788]
We introduce I&S-ViT, a novel method that regulates the PTQ of ViTs in an inclusive and stable fashion. I&S-ViT elevates the performance of 3-bit ViT-B by an impressive 50.68%.
arXiv Detail & Related papers (2023-11-16T13:07:47Z)
PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models [52.09865918265002]
We propose a novel quantize before fine-tuning'' framework, PreQuant. PreQuant is compatible with various quantization strategies, with outlier-aware fine-tuning incorporated to correct the induced quantization error. We demonstrate the effectiveness of PreQuant on the GLUE benchmark using BERT, RoBERTa, and T5.
arXiv Detail & Related papers (2023-05-30T08:41:33Z)
Towards Accurate Post-Training Quantization for Vision Transformer [48.779346466374406]
Existing post-training quantization methods still cause severe performance drops. APQ-ViT surpasses the existing post-training quantization methods by convincing margins.
arXiv Detail & Related papers (2023-03-25T03:05:26Z)
NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers [53.85087932591237]
NoisyQuant is a quantizer-agnostic enhancement for the post-training activation quantization performance of vision transformers. Building on the theoretical insight, NoisyQuant achieves the first success on actively altering the heavy-tailed activation distribution. NoisyQuant largely improves the post-training quantization performance of vision transformer with minimal computation overhead.
arXiv Detail & Related papers (2022-11-29T10:02:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.