Q-ViT: Fully Differentiable Quantization for Vision Transformer
- URL: http://arxiv.org/abs/2201.07703v1
- Date: Wed, 19 Jan 2022 16:43:17 GMT
- Title: Q-ViT: Fully Differentiable Quantization for Vision Transformer
- Authors: Zhexin Li, Tong Yang, Peisong Wang, Jian Cheng
- Abstract summary: We propose a fully differentiable quantization method for vision transformer (ViT) named as Q-ViT.
We leverage head-wise bit-width to squeeze the size of Q-ViT while preserving performance.
In particular, our method outperforms the state-of-the-art uniform quantization method by 1.5% on DeiT-Tiny.
- Score: 27.361973340056963
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this paper, we propose a fully differentiable quantization method for
vision transformer (ViT) named as Q-ViT, in which both of the quantization
scales and bit-widths are learnable parameters. Specifically, based on our
observation that heads in ViT display different quantization robustness, we
leverage head-wise bit-width to squeeze the size of Q-ViT while preserving
performance. In addition, we propose a novel technique named switchable scale
to resolve the convergence problem in the joint training of quantization scales
and bit-widths. In this way, Q-ViT pushes the limits of ViT quantization to
3-bit without heavy performance drop. Moreover, we analyze the quantization
robustness of every architecture component of ViT and show that the Multi-head
Self-Attention (MSA) and the Gaussian Error Linear Units (GELU) are the key
aspects for ViT quantization. This study provides some insights for further
research about ViT quantization. Extensive experiments on different ViT models,
such as DeiT and Swin Transformer show the effectiveness of our quantization
method. In particular, our method outperforms the state-of-the-art uniform
quantization method by 1.5% on DeiT-Tiny.
Related papers
- MPTQ-ViT: Mixed-Precision Post-Training Quantization for Vision
Transformer [7.041718444626999]
We propose a mixed-precision post-training quantization framework for vision transformers (MPTQ-ViT)
Our experiments on ViT, DeiT, and Swin demonstrate significant accuracy improvements compared with SOTA on the ImageNet dataset.
arXiv Detail & Related papers (2024-01-26T14:25:15Z) - I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization [49.17407185195788]
We introduce I&S-ViT, a novel method that regulates the PTQ of ViTs in an inclusive and stable fashion.
I&S-ViT elevates the performance of 3-bit ViT-B by an impressive 50.68%.
arXiv Detail & Related papers (2023-11-16T13:07:47Z) - Quantization Variation: A New Perspective on Training Transformers with Low-Bit Precision [45.69716658698776]
In this paper, we identify the difficulty of transformer low-bit quantization-aware training on its unique variation behaviors.
We propose a variation-aware quantization scheme for both vision and language transformers.
Our solution substantially improves the 2-bit Swin-T and binary BERT-base, achieving a 3.35% and 1.4% accuracy improvement.
arXiv Detail & Related papers (2023-07-01T13:01:39Z) - Towards Accurate Post-Training Quantization for Vision Transformer [48.779346466374406]
Existing post-training quantization methods still cause severe performance drops.
APQ-ViT surpasses the existing post-training quantization methods by convincing margins.
arXiv Detail & Related papers (2023-03-25T03:05:26Z) - RepQ-ViT: Scale Reparameterization for Post-Training Quantization of
Vision Transformers [2.114921680609289]
We propose RepQ-ViT, a novel PTQ framework for vision transformers (ViTs)
RepQ-ViT decouples the quantization and inference processes.
It can outperform existing strong baselines and encouragingly improve the accuracy of 4-bit PTQ of ViTs to a usable level.
arXiv Detail & Related papers (2022-12-16T02:52:37Z) - NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization
for Vision Transformers [53.85087932591237]
NoisyQuant is a quantizer-agnostic enhancement for the post-training activation quantization performance of vision transformers.
Building on the theoretical insight, NoisyQuant achieves the first success on actively altering the heavy-tailed activation distribution.
NoisyQuant largely improves the post-training quantization performance of vision transformer with minimal computation overhead.
arXiv Detail & Related papers (2022-11-29T10:02:09Z) - Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer [56.87383229709899]
We develop an information rectification module (IRM) and a distribution guided distillation scheme for fully quantized vision transformers (Q-ViT)
Our method achieves a much better performance than the prior arts.
arXiv Detail & Related papers (2022-10-13T04:00:29Z) - Post-Training Quantization for Vision Transformer [85.57953732941101]
We present an effective post-training quantization algorithm for reducing the memory storage and computational costs of vision transformers.
We can obtain an 81.29% top-1 accuracy using DeiT-B model on ImageNet dataset with about 8-bit quantization.
arXiv Detail & Related papers (2021-06-27T06:27:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.