AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer
- URL: http://arxiv.org/abs/2407.12951v1
- Date: Wed, 17 Jul 2024 18:38:48 GMT
- Title: AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer
- Authors: Zhuguanyu Wu, Jiaxin Chen, Hanwen Zhong, Di Huang, Yunhong Wang,
- Abstract summary: Vision Transformer (ViT) has become one of the most prevailing fundamental backbone networks in the computer vision community.
We propose a novel non-uniform quantizer, dubbed the Adaptive Logarithm AdaLog (AdaLog) quantizer.
- Score: 54.713778961605115
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision Transformer (ViT) has become one of the most prevailing fundamental backbone networks in the computer vision community. Despite the high accuracy, deploying it in real applications raises critical challenges including the high computational cost and inference latency. Recently, the post-training quantization (PTQ) technique has emerged as a promising way to enhance ViT's efficiency. Nevertheless, existing PTQ approaches for ViT suffer from the inflexible quantization on the post-Softmax and post-GELU activations that obey the power-law-like distributions. To address these issues, we propose a novel non-uniform quantizer, dubbed the Adaptive Logarithm AdaLog (AdaLog) quantizer. It optimizes the logarithmic base to accommodate the power-law-like distribution of activations, while simultaneously allowing for hardware-friendly quantization and de-quantization. By employing the bias reparameterization, the AdaLog quantizer is applicable to both the post-Softmax and post-GELU activations. Moreover, we develop an efficient Fast Progressive Combining Search (FPCS) strategy to determine the optimal logarithm base for AdaLog, as well as the scaling factors and zero points for the uniform quantizers. Extensive experimental results on public benchmarks demonstrate the effectiveness of our approach for various ViT-based architectures and vision tasks including classification, object detection, and instance segmentation. Code is available at https://github.com/GoatWu/AdaLog.
Related papers
- DopQ-ViT: Towards Distribution-Friendly and Outlier-Aware Post-Training Quantization for Vision Transformers [2.0862654518798034]
We propose a Distribution-Friendly and Outlier-Aware Post-training Quantization method for Vision Transformers.
DopQ-ViT analyzes the inefficiencies of current quantizers and introduces a distribution-friendly Tan Quantizer called TanQ.
DopQ-ViT has been extensively validated and significantly improves the performance of quantization models.
arXiv Detail & Related papers (2024-08-06T16:40:04Z) - Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers [56.37495946212932]
Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs)
This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs.
arXiv Detail & Related papers (2024-07-25T16:35:46Z) - ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers [7.155242379236052]
Quantization of Vision Transformers (ViTs) has emerged as a promising solution to mitigate these challenges.
Existing methods still suffer from significant accuracy loss at low-bit.
ADFQ-ViT provides significant improvements over various baselines in image classification, object detection, and instance segmentation tasks at 4-bit.
arXiv Detail & Related papers (2024-07-03T02:41:59Z) - Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive Survey [6.04807281619171]
Vision Transformers (ViTs) have recently garnered considerable attention, emerging as a promising alternative to convolutional neural networks (CNNs) in several vision-related applications.
This article provides a comprehensive survey of ViTs quantization and its hardware acceleration.
arXiv Detail & Related papers (2024-05-01T04:32:07Z) - Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers [10.566264033360282]
Post-training quantization (PTQ) has emerged as a promising solution for deploying hyper-scale models on edge devices such as mobile and TVs.
In this paper, we propose a novel PTQ algorithm that balances accuracy and efficiency.
arXiv Detail & Related papers (2024-02-14T05:58:43Z) - I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization [49.17407185195788]
We introduce I&S-ViT, a novel method that regulates the PTQ of ViTs in an inclusive and stable fashion.
I&S-ViT elevates the performance of 3-bit ViT-B by an impressive 50.68%.
arXiv Detail & Related papers (2023-11-16T13:07:47Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - RepQ-ViT: Scale Reparameterization for Post-Training Quantization of
Vision Transformers [2.114921680609289]
We propose RepQ-ViT, a novel PTQ framework for vision transformers (ViTs)
RepQ-ViT decouples the quantization and inference processes.
It can outperform existing strong baselines and encouragingly improve the accuracy of 4-bit PTQ of ViTs to a usable level.
arXiv Detail & Related papers (2022-12-16T02:52:37Z) - VAQF: Fully Automatic Software-hardware Co-design Framework for Low-bit
Vision Transformer [121.85581713299918]
We propose VAQF, a framework that builds inference accelerators on FPGA platforms for quantized Vision Transformers (ViTs)
Given the model structure and the desired frame rate, VAQF will automatically output the required quantization precision for activations.
This is the first time quantization has been incorporated into ViT acceleration on FPGAs.
arXiv Detail & Related papers (2022-01-17T20:27:52Z) - Scaling Quantum Approximate Optimization on Near-term Hardware [49.94954584453379]
We quantify scaling of the expected resource requirements by optimized circuits for hardware architectures with varying levels of connectivity.
We show the number of measurements, and hence total time to synthesizing solution, grows exponentially in problem size and problem graph degree.
These problems may be alleviated by increasing hardware connectivity or by recently proposed modifications to the QAOA that achieve higher performance with fewer circuit layers.
arXiv Detail & Related papers (2022-01-06T21:02:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.