CPT-V: A Contrastive Approach to Post-Training Quantization of Vision
Transformers
- URL: http://arxiv.org/abs/2211.09643v1
- Date: Thu, 17 Nov 2022 16:41:31 GMT
- Title: CPT-V: A Contrastive Approach to Post-Training Quantization of Vision
Transformers
- Authors: Natalia Frumkin, Dibakar Gope, and Diana Marculescu
- Abstract summary: We find a way to improve the accuracy of networks that have already been quantized, simply by perturbing the quantization scales.
CTP-V contrasts the features of quantized and full precision models in a self-supervised fashion.
It improves the top-1 accuracy of a fully quantized ViT-Base by 10.30%, 0.78%, and 0.15% for 3-bit, 4-bit, and 8-bit weight quantization levels.
- Score: 12.987397453149537
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When considering post-training quantization, prior work has typically focused
on developing a mixed precision scheme or learning the best way to partition a
network for quantization. In our work, CPT-V, we look at a general way to
improve the accuracy of networks that have already been quantized, simply by
perturbing the quantization scales. Borrowing the idea of contrastive loss from
self-supervised learning, we find a robust way to jointly minimize a loss
function using just 1,000 calibration images. In order to determine the best
performing quantization scale, CPT-V contrasts the features of quantized and
full precision models in a self-supervised fashion.
Unlike traditional reconstruction-based loss functions, the use of a
contrastive loss function not only rewards similarity between the quantized and
full precision outputs but also helps in distinguishing the quantized output
from other outputs within a given batch. In addition, in contrast to prior
works, CPT-V proposes a block-wise evolutionary search to minimize a global
contrastive loss objective, allowing for accuracy improvement of existing
vision transformer (ViT) quantization schemes. For example, CPT-V improves the
top-1 accuracy of a fully quantized ViT-Base by 10.30%, 0.78%, and 0.15% for
3-bit, 4-bit, and 8-bit weight quantization levels. Extensive experiments on a
variety of other ViT architectures further demonstrate its robustness in
extreme quantization scenarios. Our code is available at <link>.
Related papers
- MPTQ-ViT: Mixed-Precision Post-Training Quantization for Vision
Transformer [7.041718444626999]
We propose a mixed-precision post-training quantization framework for vision transformers (MPTQ-ViT)
Our experiments on ViT, DeiT, and Swin demonstrate significant accuracy improvements compared with SOTA on the ImageNet dataset.
arXiv Detail & Related papers (2024-01-26T14:25:15Z) - On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks [52.97107229149988]
We propose an On-Chip Hardware-Aware Quantization framework, performing hardware-aware mixed-precision quantization on deployed edge devices.
For efficiency metrics, we built an On-Chip Quantization Aware pipeline, which allows the quantization process to perceive the actual hardware efficiency of the quantization operator.
For accuracy metrics, we propose Mask-Guided Quantization Estimation technology to effectively estimate the accuracy impact of operators in the on-chip scenario.
arXiv Detail & Related papers (2023-09-05T04:39:34Z) - Patch-wise Mixed-Precision Quantization of Vision Transformer [2.3104000011280403]
Vision Transformers (ViTs) require complex self-attention computation to guarantee the learning of powerful feature representations.
We propose a novel patch-wise mixed-precision quantization (PMQ) for efficient inference of ViTs.
arXiv Detail & Related papers (2023-05-11T04:34:10Z) - Towards Accurate Post-Training Quantization for Vision Transformer [48.779346466374406]
Existing post-training quantization methods still cause severe performance drops.
APQ-ViT surpasses the existing post-training quantization methods by convincing margins.
arXiv Detail & Related papers (2023-03-25T03:05:26Z) - NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization
for Vision Transformers [53.85087932591237]
NoisyQuant is a quantizer-agnostic enhancement for the post-training activation quantization performance of vision transformers.
Building on the theoretical insight, NoisyQuant achieves the first success on actively altering the heavy-tailed activation distribution.
NoisyQuant largely improves the post-training quantization performance of vision transformer with minimal computation overhead.
arXiv Detail & Related papers (2022-11-29T10:02:09Z) - Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer [56.87383229709899]
We develop an information rectification module (IRM) and a distribution guided distillation scheme for fully quantized vision transformers (Q-ViT)
Our method achieves a much better performance than the prior arts.
arXiv Detail & Related papers (2022-10-13T04:00:29Z) - PTQ4ViT: Post-training quantization for vision transformers with twin uniform quantization [12.136898590792754]
We analyze the problems of quantization on vision transformers.
We propose the twin uniform quantization method to reduce the quantization error on these activation values.
Experiments show the quantized vision transformers achieve near-lossless prediction accuracy (less than 0.5% drop at 8-bit quantization) on the ImageNet classification task.
arXiv Detail & Related papers (2021-11-24T06:23:06Z) - Post-Training Quantization for Vision Transformer [85.57953732941101]
We present an effective post-training quantization algorithm for reducing the memory storage and computational costs of vision transformers.
We can obtain an 81.29% top-1 accuracy using DeiT-B model on ImageNet dataset with about 8-bit quantization.
arXiv Detail & Related papers (2021-06-27T06:27:22Z) - DAQ: Distribution-Aware Quantization for Deep Image Super-Resolution
Networks [49.191062785007006]
Quantizing deep convolutional neural networks for image super-resolution substantially reduces their computational costs.
Existing works either suffer from a severe performance drop in ultra-low precision of 4 or lower bit-widths, or require a heavy fine-tuning process to recover the performance.
We propose a novel distribution-aware quantization scheme (DAQ) which facilitates accurate training-free quantization in ultra-low precision.
arXiv Detail & Related papers (2020-12-21T10:19:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.