GSB: Group Superposition Binarization for Vision Transformer with
Limited Training Samples
- URL: http://arxiv.org/abs/2305.07931v4
- Date: Thu, 18 Jan 2024 08:22:14 GMT
- Title: GSB: Group Superposition Binarization for Vision Transformer with
Limited Training Samples
- Authors: Tian Gao, Cheng-Zhong Xu, Le Zhang, Hui Kong
- Abstract summary: Vision Transformer (ViT) has performed remarkably in various computer vision tasks.
ViT usually suffers from serious overfitting problems with a relatively limited number of training samples.
We propose a novel model binarization technique, called Group Superposition Binarization (GSB)
- Score: 46.025105938192624
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision Transformer (ViT) has performed remarkably in various computer vision
tasks. Nonetheless, affected by the massive amount of parameters, ViT usually
suffers from serious overfitting problems with a relatively limited number of
training samples. In addition, ViT generally demands heavy computing resources,
which limit its deployment on resource-constrained devices. As a type of
model-compression method, model binarization is potentially a good choice to
solve the above problems. Compared with the full-precision one, the model with
the binarization method replaces complex tensor multiplication with simple
bit-wise binary operations and represents full-precision model parameters and
activations with only 1-bit ones, which potentially solves the problem of model
size and computational complexity, respectively. In this paper, we investigate
a binarized ViT model. Empirically, we observe that the existing binarization
technology designed for Convolutional Neural Networks (CNN) cannot migrate well
to a ViT's binarization task. We also find that the decline of the accuracy of
the binary ViT model is mainly due to the information loss of the Attention
module and the Value vector. Therefore, we propose a novel model binarization
technique, called Group Superposition Binarization (GSB), to deal with these
issues. Furthermore, in order to further improve the performance of the
binarization model, we have investigated the gradient calculation procedure in
the binarization process and derived more proper gradient calculation equations
for GSB to reduce the influence of gradient mismatch. Then, the knowledge
distillation technique is introduced to alleviate the performance degradation
caused by model binarization. Analytically, model binarization can limit the
parameters search space during parameter updates while training a model....
Related papers
- Research on Personalized Compression Algorithm for Pre-trained Models Based on Homomorphic Entropy Increase [2.6513322539118582]
We explore the challenges and evolution of two key technologies in the current field of AI: Vision Transformer model and Large Language Model (LLM)
Vision Transformer captures global information by splitting images into small pieces, but its high reference count and compute overhead limit deployment on mobile devices.
LLM has revolutionized natural language processing, but it also faces huge deployment challenges.
arXiv Detail & Related papers (2024-08-16T11:56:49Z) - LORTSAR: Low-Rank Transformer for Skeleton-based Action Recognition [4.375744277719009]
LORTSAR is applied to two leading Transformer-based models, "Hyperformer" and "STEP-CATFormer"
Our method can reduce the number of model parameters substantially with negligible degradation or even performance increase in recognition accuracy.
This confirms that SVD combined with post-compression fine-tuning can boost model efficiency, paving the way for more sustainable, lightweight, and high-performance technologies in human action recognition.
arXiv Detail & Related papers (2024-07-19T20:19:41Z) - Binarized Diffusion Model for Image Super-Resolution [61.963833405167875]
Binarization, an ultra-compression algorithm, offers the potential for effectively accelerating advanced diffusion models (DMs)
Existing binarization methods result in significant performance degradation.
We introduce a novel binarized diffusion model, BI-DiffSR, for image SR.
arXiv Detail & Related papers (2024-06-09T10:30:25Z) - FBPT: A Fully Binary Point Transformer [12.373066597900127]
This paper presents a novel Fully Binary Point Cloud Transformer (FBPT) model which has the potential to be widely applied and expanded in the fields of robotics and mobile devices.
By compressing the weights and activations of a 32-bit full-precision network to 1-bit binary values, the proposed binary point cloud Transformer network significantly reduces the storage footprint and computational resource requirements.
The primary focus of this paper is on addressing the performance degradation issue caused by the use of binary point cloud Transformer modules.
arXiv Detail & Related papers (2024-03-15T03:45:10Z) - VST++: Efficient and Stronger Visual Saliency Transformer [74.26078624363274]
We develop an efficient and stronger VST++ model to explore global long-range dependencies.
We evaluate our model across various transformer-based backbones on RGB, RGB-D, and RGB-T SOD benchmark datasets.
arXiv Detail & Related papers (2023-10-18T05:44:49Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - BinaryViT: Towards Efficient and Accurate Binary Vision Transformers [4.339315098369913]
Vision Transformers (ViTs) have emerged as the fundamental architecture for most computer vision fields.
As one of the most powerful compression methods, binarization reduces the computation of the neural network by quantizing the weights and activation values as $pm$1.
Existing binarization methods have demonstrated excellent performance on CNNs, but the full binarization of ViTs is still under-studied and suffering a significant performance drop.
arXiv Detail & Related papers (2023-05-24T05:06:59Z) - BiViT: Extremely Compressed Binary Vision Transformer [19.985314022860432]
We propose to solve two fundamental challenges to push the horizon of Binary Vision Transformers (BiViT)
We propose Softmax-aware Binarization, which dynamically adapts to the data distribution and reduces the error caused by binarization.
Our method performs favorably against state-of-the-arts by 19.8% on the TinyImageNet dataset.
arXiv Detail & Related papers (2022-11-14T03:36:38Z) - Learning Bounded Context-Free-Grammar via LSTM and the
Transformer:Difference and Explanations [51.77000472945441]
Long Short-Term Memory (LSTM) and Transformers are two popular neural architectures used for natural language processing tasks.
In practice, it is often observed that Transformer models have better representation power than LSTM.
We study such practical differences between LSTM and Transformer and propose an explanation based on their latent space decomposition patterns.
arXiv Detail & Related papers (2021-12-16T19:56:44Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.