Variation-aware Vision Transformer Quantization
- URL: http://arxiv.org/abs/2307.00331v1
- Date: Sat, 1 Jul 2023 13:01:39 GMT
- Title: Variation-aware Vision Transformer Quantization
- Authors: Xijie Huang, Zhiqiang Shen, Kwang-Ting Cheng
- Abstract summary: We study the difficulty of ViT quantization on its unique variation behaviors.
We find that the variations in ViTs cause training oscillations, bringing instability during quantization-aware training (QAT)
We propose a knowledge-distillation-based variation-aware quantization method.
- Score: 49.741297464791835
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Despite the remarkable performance of Vision Transformers (ViTs) in various
visual tasks, the expanding computation and model size of ViTs have increased
the demand for improved efficiency during training and inference. To address
the heavy computation and parameter drawbacks, quantization is frequently
studied in the community as a representative model compression technique and
has seen extensive use on CNNs. However, due to the unique properties of CNNs
and ViTs, the quantization applications on ViTs are still limited and
underexplored. In this paper, we identify the difficulty of ViT quantization on
its unique variation behaviors, which differ from traditional CNN
architectures. The variations indicate the magnitude of the parameter
fluctuations and can also measure outlier conditions. Moreover, the variation
behaviors reflect the various sensitivities to the quantization of each module.
The quantization sensitivity analysis and comparison of ViTs with CNNs help us
locate the underlying differences in variations. We also find that the
variations in ViTs cause training oscillations, bringing instability during
quantization-aware training (QAT). Correspondingly, we solve the variation
problem with an efficient knowledge-distillation-based variation-aware
quantization method. The multi-crop knowledge distillation scheme can
accelerate and stabilize the training and alleviate the variation's influence
during QAT. We also proposed a module-dependent quantization scheme and a
variation-aware regularization term to suppress the oscillation of weights. On
ImageNet-1K, we obtain a 77.66% Top-1 accuracy on the extremely low-bit
scenario of 2-bit Swin-T, outperforming the previous state-of-the-art quantized
model by 3.35%.
Related papers
- Instance-Aware Group Quantization for Vision Transformers [20.105148326987646]
Post-training quantization (PTQ) is an efficient model compression technique that quantizes a pretrained full-precision model.
PTQ methods for convolutional neural networks (CNNs) provide quantization results comparable to full-precision counterparts.
We introduce instance-aware group quantization for ViTs (IGQ-ViT)
arXiv Detail & Related papers (2024-04-01T05:12:30Z) - QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven
Fine Tuning [16.50084447690437]
The study focuses on uncovering the underlying causes of these accuracy drops and proposing a quantization-friendly fine-tuning method, textbfQuantTune.
Our approach showcases significant improvements in post-training quantization across a range of Transformer-based models, including ViT, Bert-base, and OPT.
arXiv Detail & Related papers (2024-03-11T08:09:30Z) - MPTQ-ViT: Mixed-Precision Post-Training Quantization for Vision
Transformer [7.041718444626999]
We propose a mixed-precision post-training quantization framework for vision transformers (MPTQ-ViT)
Our experiments on ViT, DeiT, and Swin demonstrate significant accuracy improvements compared with SOTA on the ImageNet dataset.
arXiv Detail & Related papers (2024-01-26T14:25:15Z) - I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of
Post-Training ViTs Quantization [63.07712842509526]
We introduce I&S-ViT, a novel method that regulates the PTQ of ViTs in an inclusive and stable fashion.
I&S-ViT elevates the performance of 3-bit ViT-B by an impressive 50.68%.
arXiv Detail & Related papers (2023-11-16T13:07:47Z) - Towards Accurate Post-Training Quantization for Vision Transformer [48.779346466374406]
Existing post-training quantization methods still cause severe performance drops.
APQ-ViT surpasses the existing post-training quantization methods by convincing margins.
arXiv Detail & Related papers (2023-03-25T03:05:26Z) - NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization
for Vision Transformers [53.85087932591237]
NoisyQuant is a quantizer-agnostic enhancement for the post-training activation quantization performance of vision transformers.
Building on the theoretical insight, NoisyQuant achieves the first success on actively altering the heavy-tailed activation distribution.
NoisyQuant largely improves the post-training quantization performance of vision transformer with minimal computation overhead.
arXiv Detail & Related papers (2022-11-29T10:02:09Z) - Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer [56.87383229709899]
We develop an information rectification module (IRM) and a distribution guided distillation scheme for fully quantized vision transformers (Q-ViT)
Our method achieves a much better performance than the prior arts.
arXiv Detail & Related papers (2022-10-13T04:00:29Z) - Q-ViT: Fully Differentiable Quantization for Vision Transformer [27.361973340056963]
We propose a fully differentiable quantization method for vision transformer (ViT) named as Q-ViT.
We leverage head-wise bit-width to squeeze the size of Q-ViT while preserving performance.
In particular, our method outperforms the state-of-the-art uniform quantization method by 1.5% on DeiT-Tiny.
arXiv Detail & Related papers (2022-01-19T16:43:17Z) - PTQ4ViT: Post-training quantization for vision transformers with twin uniform quantization [12.136898590792754]
We analyze the problems of quantization on vision transformers.
We propose the twin uniform quantization method to reduce the quantization error on these activation values.
Experiments show the quantized vision transformers achieve near-lossless prediction accuracy (less than 0.5% drop at 8-bit quantization) on the ImageNet classification task.
arXiv Detail & Related papers (2021-11-24T06:23:06Z) - Post-Training Quantization for Vision Transformer [85.57953732941101]
We present an effective post-training quantization algorithm for reducing the memory storage and computational costs of vision transformers.
We can obtain an 81.29% top-1 accuracy using DeiT-B model on ImageNet dataset with about 8-bit quantization.
arXiv Detail & Related papers (2021-06-27T06:27:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.