NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization
for Vision Transformers
- URL: http://arxiv.org/abs/2211.16056v2
- Date: Wed, 19 Apr 2023 17:30:33 GMT
- Title: NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization
for Vision Transformers
- Authors: Yijiang Liu, Huanrui Yang, Zhen Dong, Kurt Keutzer, Li Du, Shanghang
Zhang
- Abstract summary: NoisyQuant is a quantizer-agnostic enhancement for the post-training activation quantization performance of vision transformers.
Building on the theoretical insight, NoisyQuant achieves the first success on actively altering the heavy-tailed activation distribution.
NoisyQuant largely improves the post-training quantization performance of vision transformer with minimal computation overhead.
- Score: 53.85087932591237
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The complicated architecture and high training cost of vision transformers
urge the exploration of post-training quantization. However, the heavy-tailed
distribution of vision transformer activations hinders the effectiveness of
previous post-training quantization methods, even with advanced quantizer
designs. Instead of tuning the quantizer to better fit the complicated
activation distribution, this paper proposes NoisyQuant, a quantizer-agnostic
enhancement for the post-training activation quantization performance of vision
transformers. We make a surprising theoretical discovery that for a given
quantizer, adding a fixed Uniform noisy bias to the values being quantized can
significantly reduce the quantization error under provable conditions. Building
on the theoretical insight, NoisyQuant achieves the first success on actively
altering the heavy-tailed activation distribution with additive noisy bias to
fit a given quantizer. Extensive experiments show NoisyQuant largely improves
the post-training quantization performance of vision transformer with minimal
computation overhead. For instance, on linear uniform 6-bit activation
quantization, NoisyQuant improves SOTA top-1 accuracy on ImageNet by up to
1.7%, 1.1% and 0.5% for ViT, DeiT, and Swin Transformer respectively, achieving
on-par or even higher performance than previous nonlinear, mixed-precision
quantization.
Related papers
- DopQ-ViT: Towards Distribution-Friendly and Outlier-Aware Post-Training Quantization for Vision Transformers [2.0862654518798034]
We propose a Distribution-Friendly and Outlier-Aware Post-training Quantization method for Vision Transformers.
DopQ-ViT analyzes the inefficiencies of current quantizers and introduces a distribution-friendly Tan Quantizer called TanQ.
DopQ-ViT has been extensively validated and significantly improves the performance of quantization models.
arXiv Detail & Related papers (2024-08-06T16:40:04Z) - ERQ: Error Reduction for Post-Training Quantization of Vision Transformers [48.740630807085566]
Post-training quantization (PTQ) for vision transformers (ViTs) has garnered significant attention due to its efficiency in compressing models.
We propose ERQ, a two-step PTQ approach meticulously crafted to sequentially reduce the quantization error arising from activation and weight quantization.
ERQ surpasses the state-of-the-art GPTQ by 22.36% in accuracy for W3A4 ViT-S.
arXiv Detail & Related papers (2024-07-09T12:06:03Z) - RepQuant: Towards Accurate Post-Training Quantization of Large
Transformer Models via Scale Reparameterization [8.827794405944637]
Post-training quantization (PTQ) is a promising solution for compressing large transformer models.
Existing PTQ methods typically exhibit non-trivial performance loss.
We propose RepQuant, a novel PTQ framework with quantization-inference decoupling paradigm.
arXiv Detail & Related papers (2024-02-08T12:35:41Z) - Near-Term Distributed Quantum Computation using Mean-Field Corrections
and Auxiliary Qubits [77.04894470683776]
We propose near-term distributed quantum computing that involve limited information transfer and conservative entanglement production.
We build upon these concepts to produce an approximate circuit-cutting technique for the fragmented pre-training of variational quantum algorithms.
arXiv Detail & Related papers (2023-09-11T18:00:00Z) - Quantization Variation: A New Perspective on Training Transformers with Low-Bit Precision [45.69716658698776]
In this paper, we identify the difficulty of transformer low-bit quantization-aware training on its unique variation behaviors.
We propose a variation-aware quantization scheme for both vision and language transformers.
Our solution substantially improves the 2-bit Swin-T and binary BERT-base, achieving a 3.35% and 1.4% accuracy improvement.
arXiv Detail & Related papers (2023-07-01T13:01:39Z) - Towards Accurate Post-Training Quantization for Vision Transformer [48.779346466374406]
Existing post-training quantization methods still cause severe performance drops.
APQ-ViT surpasses the existing post-training quantization methods by convincing margins.
arXiv Detail & Related papers (2023-03-25T03:05:26Z) - PTQ4ViT: Post-training quantization for vision transformers with twin uniform quantization [12.136898590792754]
We analyze the problems of quantization on vision transformers.
We propose the twin uniform quantization method to reduce the quantization error on these activation values.
Experiments show the quantized vision transformers achieve near-lossless prediction accuracy (less than 0.5% drop at 8-bit quantization) on the ImageNet classification task.
arXiv Detail & Related papers (2021-11-24T06:23:06Z) - Post-Training Quantization for Vision Transformer [85.57953732941101]
We present an effective post-training quantization algorithm for reducing the memory storage and computational costs of vision transformers.
We can obtain an 81.29% top-1 accuracy using DeiT-B model on ImageNet dataset with about 8-bit quantization.
arXiv Detail & Related papers (2021-06-27T06:27:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.