MPTQ-ViT: Mixed-Precision Post-Training Quantization for Vision
Transformer
- URL: http://arxiv.org/abs/2401.14895v2
- Date: Thu, 1 Feb 2024 02:05:02 GMT
- Title: MPTQ-ViT: Mixed-Precision Post-Training Quantization for Vision
Transformer
- Authors: Yu-Shan Tai, An-Yeu (Andy) Wu
- Abstract summary: We propose a mixed-precision post-training quantization framework for vision transformers (MPTQ-ViT)
Our experiments on ViT, DeiT, and Swin demonstrate significant accuracy improvements compared with SOTA on the ImageNet dataset.
- Score: 7.041718444626999
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While vision transformers (ViTs) have shown great potential in computer
vision tasks, their intense computation and memory requirements pose challenges
for practical applications. Existing post-training quantization methods
leverage value redistribution or specialized quantizers to address the
non-normal distribution in ViTs. However, without considering the asymmetry in
activations and relying on hand-crafted settings, these methods often struggle
to maintain performance under low-bit quantization. To overcome these
challenges, we introduce SmoothQuant with bias term (SQ-b) to alleviate the
asymmetry issue and reduce the clamping loss. We also introduce optimal scaling
factor ratio search (OPT-m) to determine quantization parameters by a
data-dependent mechanism automatically. To further enhance the compressibility,
we incorporate the above-mentioned techniques and propose a mixed-precision
post-training quantization framework for vision transformers (MPTQ-ViT). We
develop greedy mixed-precision quantization (Greedy MP) to allocate layer-wise
bit-width considering both model performance and compressibility. Our
experiments on ViT, DeiT, and Swin demonstrate significant accuracy
improvements compared with SOTA on the ImageNet dataset. Specifically, our
proposed methods achieve accuracy improvements ranging from 0.90% to 23.35% on
4-bit ViTs with single-precision and from 3.82% to 78.14% on 5-bit fully
quantized ViTs with mixed-precision.
Related papers
- DopQ-ViT: Towards Distribution-Friendly and Outlier-Aware Post-Training Quantization for Vision Transformers [2.0862654518798034]
We propose a Distribution-Friendly and Outlier-Aware Post-training Quantization method for Vision Transformers.
DopQ-ViT analyzes the inefficiencies of current quantizers and introduces a distribution-friendly Tan Quantizer called TanQ.
DopQ-ViT has been extensively validated and significantly improves the performance of quantization models.
arXiv Detail & Related papers (2024-08-06T16:40:04Z) - 2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution [83.09117439860607]
Low-bit quantization has become widespread for compressing image super-resolution (SR) models for edge deployment.
It is notorious that low-bit quantization degrades the accuracy of SR models compared to their full-precision (FP) counterparts.
We present a dual-stage low-bit post-training quantization (PTQ) method for image super-resolution, namely 2DQuant, which achieves efficient and accurate SR under low-bit quantization.
arXiv Detail & Related papers (2024-06-10T06:06:11Z) - Patch-wise Mixed-Precision Quantization of Vision Transformer [2.3104000011280403]
Vision Transformers (ViTs) require complex self-attention computation to guarantee the learning of powerful feature representations.
We propose a novel patch-wise mixed-precision quantization (PMQ) for efficient inference of ViTs.
arXiv Detail & Related papers (2023-05-11T04:34:10Z) - Towards Accurate Post-Training Quantization for Vision Transformer [48.779346466374406]
Existing post-training quantization methods still cause severe performance drops.
APQ-ViT surpasses the existing post-training quantization methods by convincing margins.
arXiv Detail & Related papers (2023-03-25T03:05:26Z) - RepQ-ViT: Scale Reparameterization for Post-Training Quantization of
Vision Transformers [2.114921680609289]
We propose RepQ-ViT, a novel PTQ framework for vision transformers (ViTs)
RepQ-ViT decouples the quantization and inference processes.
It can outperform existing strong baselines and encouragingly improve the accuracy of 4-bit PTQ of ViTs to a usable level.
arXiv Detail & Related papers (2022-12-16T02:52:37Z) - NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization
for Vision Transformers [53.85087932591237]
NoisyQuant is a quantizer-agnostic enhancement for the post-training activation quantization performance of vision transformers.
Building on the theoretical insight, NoisyQuant achieves the first success on actively altering the heavy-tailed activation distribution.
NoisyQuant largely improves the post-training quantization performance of vision transformer with minimal computation overhead.
arXiv Detail & Related papers (2022-11-29T10:02:09Z) - Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer [56.87383229709899]
We develop an information rectification module (IRM) and a distribution guided distillation scheme for fully quantized vision transformers (Q-ViT)
Our method achieves a much better performance than the prior arts.
arXiv Detail & Related papers (2022-10-13T04:00:29Z) - Q-ViT: Fully Differentiable Quantization for Vision Transformer [27.361973340056963]
We propose a fully differentiable quantization method for vision transformer (ViT) named as Q-ViT.
We leverage head-wise bit-width to squeeze the size of Q-ViT while preserving performance.
In particular, our method outperforms the state-of-the-art uniform quantization method by 1.5% on DeiT-Tiny.
arXiv Detail & Related papers (2022-01-19T16:43:17Z) - AdaViT: Adaptive Tokens for Efficient Vision Transformer [91.88404546243113]
We introduce AdaViT, a method that adaptively adjusts the inference cost of vision transformer (ViT) for images of different complexity.
AdaViT achieves this by automatically reducing the number of tokens in vision transformers that are processed in the network as inference proceeds.
arXiv Detail & Related papers (2021-12-14T18:56:07Z) - Post-Training Quantization for Vision Transformer [85.57953732941101]
We present an effective post-training quantization algorithm for reducing the memory storage and computational costs of vision transformers.
We can obtain an 81.29% top-1 accuracy using DeiT-B model on ImageNet dataset with about 8-bit quantization.
arXiv Detail & Related papers (2021-06-27T06:27:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.