Variation-aware Vision Transformer Quantization
- URL: http://arxiv.org/abs/2307.00331v1
- Date: Sat, 1 Jul 2023 13:01:39 GMT
- Title: Variation-aware Vision Transformer Quantization
- Authors: Xijie Huang, Zhiqiang Shen, Kwang-Ting Cheng
- Abstract summary: We study the difficulty of ViT quantization on its unique variation behaviors.
We find that the variations in ViTs cause training oscillations, bringing instability during quantization-aware training (QAT)
We propose a knowledge-distillation-based variation-aware quantization method.
- Score: 49.741297464791835
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Despite the remarkable performance of Vision Transformers (ViTs) in various
visual tasks, the expanding computation and model size of ViTs have increased
the demand for improved efficiency during training and inference. To address
the heavy computation and parameter drawbacks, quantization is frequently
studied in the community as a representative model compression technique and
has seen extensive use on CNNs. However, due to the unique properties of CNNs
and ViTs, the quantization applications on ViTs are still limited and
underexplored. In this paper, we identify the difficulty of ViT quantization on
its unique variation behaviors, which differ from traditional CNN
architectures. The variations indicate the magnitude of the parameter
fluctuations and can also measure outlier conditions. Moreover, the variation
behaviors reflect the various sensitivities to the quantization of each module.
The quantization sensitivity analysis and comparison of ViTs with CNNs help us
locate the underlying differences in variations. We also find that the
variations in ViTs cause training oscillations, bringing instability during
quantization-aware training (QAT). Correspondingly, we solve the variation
problem with an efficient knowledge-distillation-based variation-aware
quantization method. The multi-crop knowledge distillation scheme can
accelerate and stabilize the training and alleviate the variation's influence
during QAT. We also proposed a module-dependent quantization scheme and a
variation-aware regularization term to suppress the oscillation of weights. On
ImageNet-1K, we obtain a 77.66% Top-1 accuracy on the extremely low-bit
scenario of 2-bit Swin-T, outperforming the previous state-of-the-art quantized
model by 3.35%.
Related papers
- ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers [7.155242379236052]
Quantization of Vision Transformers (ViTs) has emerged as a promising solution to mitigate these challenges.
Existing methods still suffer from significant accuracy loss at low-bit.
ADFQ-ViT provides significant improvements over various baselines in image classification, object detection, and instance segmentation tasks at 4-bit.
arXiv Detail & Related papers (2024-07-03T02:41:59Z) - RepQuant: Towards Accurate Post-Training Quantization of Large
Transformer Models via Scale Reparameterization [8.827794405944637]
Post-training quantization (PTQ) is a promising solution for compressing large transformer models.
Existing PTQ methods typically exhibit non-trivial performance loss.
We propose RepQuant, a novel PTQ framework with quantization-inference decoupling paradigm.
arXiv Detail & Related papers (2024-02-08T12:35:41Z) - Towards Accurate Post-Training Quantization for Vision Transformer [48.779346466374406]
Existing post-training quantization methods still cause severe performance drops.
APQ-ViT surpasses the existing post-training quantization methods by convincing margins.
arXiv Detail & Related papers (2023-03-25T03:05:26Z) - NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization
for Vision Transformers [53.85087932591237]
NoisyQuant is a quantizer-agnostic enhancement for the post-training activation quantization performance of vision transformers.
Building on the theoretical insight, NoisyQuant achieves the first success on actively altering the heavy-tailed activation distribution.
NoisyQuant largely improves the post-training quantization performance of vision transformer with minimal computation overhead.
arXiv Detail & Related papers (2022-11-29T10:02:09Z) - Patch Similarity Aware Data-Free Quantization for Vision Transformers [2.954890575035673]
We propose PSAQ-ViT, a Patch Similarity Aware data-free Quantization framework for Vision Transformers.
We analyze the self-attention module's properties and reveal a general difference (patch similarity) in its processing of Gaussian noise and real images.
Experiments and ablation studies are conducted on various benchmarks to validate the effectiveness of PSAQ-ViT.
arXiv Detail & Related papers (2022-03-04T11:47:20Z) - AdaViT: Adaptive Vision Transformers for Efficient Image Recognition [78.07924262215181]
We introduce AdaViT, an adaptive framework that learns to derive usage policies on which patches, self-attention heads and transformer blocks to use.
Our method obtains more than 2x improvement on efficiency compared to state-of-the-art vision transformers with only 0.8% drop of accuracy.
arXiv Detail & Related papers (2021-11-30T18:57:02Z) - Understanding and Overcoming the Challenges of Efficient Transformer
Quantization [17.05322956052278]
Transformer-based architectures have become the de-facto standard models for a wide range of Natural Language Processing tasks.
However, their memory footprint and high latency are prohibitive for efficient deployment and inference on resource-limited devices.
We show that transformers have unique quantization challenges -- namely, high dynamic activation ranges that are difficult to represent with a low bit fixed-point format.
arXiv Detail & Related papers (2021-09-27T10:57:18Z) - Post-Training Quantization for Vision Transformer [85.57953732941101]
We present an effective post-training quantization algorithm for reducing the memory storage and computational costs of vision transformers.
We can obtain an 81.29% top-1 accuracy using DeiT-B model on ImageNet dataset with about 8-bit quantization.
arXiv Detail & Related papers (2021-06-27T06:27:22Z) - Variational Transformers for Diverse Response Generation [71.53159402053392]
Variational Transformer (VT) is a variational self-attentive feed-forward sequence model.
VT combines the parallelizability and global receptive field computation of the Transformer with the variational nature of the CVAE.
We explore two types of VT: 1) modeling the discourse-level diversity with a global latent variable; and 2) augmenting the Transformer decoder with a sequence of finegrained latent variables.
arXiv Detail & Related papers (2020-03-28T07:48:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.