BinaryViT: Towards Efficient and Accurate Binary Vision Transformers
- URL: http://arxiv.org/abs/2305.14730v3
- Date: Tue, 5 Sep 2023 03:38:17 GMT
- Title: BinaryViT: Towards Efficient and Accurate Binary Vision Transformers
- Authors: Junrui Xiao, Zhikai Li, Lianwei Yang, Qingyi Gu
- Abstract summary: Vision Transformers (ViTs) have emerged as the fundamental architecture for most computer vision fields.
As one of the most powerful compression methods, binarization reduces the computation of the neural network by quantizing the weights and activation values as $pm$1.
Existing binarization methods have demonstrated excellent performance on CNNs, but the full binarization of ViTs is still under-studied and suffering a significant performance drop.
- Score: 4.339315098369913
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision Transformers (ViTs) have emerged as the fundamental architecture for
most computer vision fields, but the considerable memory and computation costs
hinders their application on resource-limited devices. As one of the most
powerful compression methods, binarization reduces the computation of the
neural network by quantizing the weights and activation values as $\pm$1.
Although existing binarization methods have demonstrated excellent performance
on Convolutional Neural Networks (CNNs), the full binarization of ViTs is still
under-studied and suffering a significant performance drop. In this paper, we
first argue empirically that the severe performance degradation is mainly
caused by the weight oscillation in the binarization training and the
information distortion in the activation of ViTs. Based on these analyses, we
propose $\textbf{BinaryViT}$, an accurate full binarization scheme for ViTs,
which pushes the quantization of ViTs to the limit. Specifically, we propose a
novel gradient regularization scheme (GRS) for driving a bimodal distribution
of the weights to reduce oscillation in binarization training. Moreover, we
design an activation shift module (ASM) to adaptively tune the activation
distribution to reduce the information distortion caused by binarization.
Extensive experiments on ImageNet dataset show that our BinaryViT consistently
surpasses the strong baseline by 2.05% and improve the accuracy of fully
binarized ViTs to a usable level. Furthermore, our method achieves impressive
savings of 16.2$\times$ and 17.7$\times$ in model size and OPs compared to the
full-precision DeiT-S.
Related papers
- Bi-ViT: Pushing the Limit of Vision Transformer Quantization [38.24456467950003]
Vision transformers (ViTs) quantization offers a promising prospect to facilitate deploying large pre-trained networks on resource-limited devices.
We introduce a learnable scaling factor to reactivate the vanished gradients and illustrate its effectiveness through theoretical and experimental analyses.
We then propose a ranking-aware distillation method to rectify the disordered ranking in a teacher-student framework.
arXiv Detail & Related papers (2023-05-21T05:24:43Z) - GSB: Group Superposition Binarization for Vision Transformer with
Limited Training Samples [46.025105938192624]
Vision Transformer (ViT) has performed remarkably in various computer vision tasks.
ViT usually suffers from serious overfitting problems with a relatively limited number of training samples.
We propose a novel model binarization technique, called Group Superposition Binarization (GSB)
arXiv Detail & Related papers (2023-05-13T14:48:09Z) - BiViT: Extremely Compressed Binary Vision Transformer [19.985314022860432]
We propose to solve two fundamental challenges to push the horizon of Binary Vision Transformers (BiViT)
We propose Softmax-aware Binarization, which dynamically adapts to the data distribution and reduces the error caused by binarization.
Our method performs favorably against state-of-the-arts by 19.8% on the TinyImageNet dataset.
arXiv Detail & Related papers (2022-11-14T03:36:38Z) - BiFSMNv2: Pushing Binary Neural Networks for Keyword Spotting to
Real-Network Performance [54.214426436283134]
Deep neural networks, such as the Deep-FSMN, have been widely studied for keyword spotting (KWS) applications.
We present a strong yet efficient binary neural network for KWS, namely BiFSMNv2, pushing it to the real-network accuracy performance.
We highlight that benefiting from the compact architecture and optimized hardware kernel, BiFSMNv2 can achieve an impressive 25.1x speedup and 20.2x storage-saving on edge hardware.
arXiv Detail & Related papers (2022-11-13T18:31:45Z) - Boosting Binary Neural Networks via Dynamic Thresholds Learning [21.835748440099586]
We introduce DySign to reduce information loss and boost representative capacity of BNNs.
For DCNNs, DyBCNNs based on two backbones achieve 71.2% and 67.4% top1-accuracy on ImageNet dataset.
For ViTs, DyCCT presents the superiority of the convolutional embedding layer in fully binarized ViTs and 56.1% on the ImageNet dataset.
arXiv Detail & Related papers (2022-11-04T07:18:21Z) - Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer [56.87383229709899]
We develop an information rectification module (IRM) and a distribution guided distillation scheme for fully quantized vision transformers (Q-ViT)
Our method achieves a much better performance than the prior arts.
arXiv Detail & Related papers (2022-10-13T04:00:29Z) - Dynamic Dual Trainable Bounds for Ultra-low Precision Super-Resolution
Networks [82.18396309806577]
We propose a novel activation quantizer, referred to as Dynamic Dual Trainable Bounds (DDTB)
Our DDTB exhibits significant performance improvements in ultra-low precision.
For example, our DDTB achieves a 0.70dB PSNR increase on Urban100 benchmark when quantizing EDSR to 2-bit and scaling up output images to x4.
arXiv Detail & Related papers (2022-03-08T04:26:18Z) - BiFSMN: Binary Neural Network for Keyword Spotting [47.46397208920726]
BiFSMN is an accurate and extreme-efficient binary neural network for KWS.
We show that BiFSMN can achieve an impressive 22.3x speedup and 15.5x storage-saving on real-world edge hardware.
arXiv Detail & Related papers (2022-02-14T05:16:53Z) - Distribution-sensitive Information Retention for Accurate Binary Neural
Network [49.971345958676196]
We present a novel Distribution-sensitive Information Retention Network (DIR-Net) to retain the information of the forward activations and backward gradients.
Our DIR-Net consistently outperforms the SOTA binarization approaches under mainstream and compact architectures.
We conduct our DIR-Net on real-world resource-limited devices which achieves 11.1 times storage saving and 5.4 times speedup.
arXiv Detail & Related papers (2021-09-25T10:59:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.