Pruning Attention Heads of Transformer Models Using A* Search: A Novel
Approach to Compress Big NLP Architectures
- URL: http://arxiv.org/abs/2110.15225v1
- Date: Thu, 28 Oct 2021 15:39:11 GMT
- Title: Pruning Attention Heads of Transformer Models Using A* Search: A Novel
Approach to Compress Big NLP Architectures
- Authors: Archit Parnami, Rahul Singh, Tarun Joshi
- Abstract summary: We propose novel pruning algorithms to compress transformer models by eliminating redundant Attention Heads.
Our results indicate that the method could eliminate as much as 40% of the attention heads in the BERT transformer model with almost no loss in accuracy.
- Score: 2.8768884210003605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent years have seen a growing adoption of Transformer models such as BERT
in Natural Language Processing and even in Computer Vision. However, due to the
size, there has been limited adoption of such models within
resource-constrained computing environments This paper proposes novel pruning
algorithms to compress transformer models by eliminating redundant Attention
Heads. We apply the A* search algorithm to obtain a pruned model with minimal
accuracy guarantees. Our results indicate that the method could eliminate as
much as 40% of the attention heads in the BERT transformer model with almost no
loss in accuracy.
Related papers
- LORTSAR: Low-Rank Transformer for Skeleton-based Action Recognition [4.375744277719009]
LORTSAR is applied to two leading Transformer-based models, "Hyperformer" and "STEP-CATFormer"
Our method can reduce the number of model parameters substantially with negligible degradation or even performance increase in recognition accuracy.
This confirms that SVD combined with post-compression fine-tuning can boost model efficiency, paving the way for more sustainable, lightweight, and high-performance technologies in human action recognition.
arXiv Detail & Related papers (2024-07-19T20:19:41Z) - SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation [53.675725490807615]
We introduce SDPose, a new self-distillation method for improving the performance of small transformer-based models.
SDPose-T obtains 69.7% mAP with 4.4M parameters and 1.8 GFLOPs, while SDPose-S-V2 obtains 73.5% mAP on the MSCOCO validation dataset.
arXiv Detail & Related papers (2024-04-04T15:23:14Z) - Adaptive Point Transformer [88.28498667506165]
Adaptive Point Cloud Transformer (AdaPT) is a standard PT model augmented by an adaptive token selection mechanism.
AdaPT dynamically reduces the number of tokens during inference, enabling efficient processing of large point clouds.
arXiv Detail & Related papers (2024-01-26T13:24:45Z) - Transformer-based approaches to Sentiment Detection [55.41644538483948]
We examined the performance of four different types of state-of-the-art transformer models for text classification.
The RoBERTa transformer model performs best on the test dataset with a score of 82.6% and is highly recommended for quality predictions.
arXiv Detail & Related papers (2023-03-13T17:12:03Z) - T4PdM: a Deep Neural Network based on the Transformer Architecture for
Fault Diagnosis of Rotating Machinery [0.0]
This paper develops an automatic fault classifier model for predictive maintenance based on a modified version of the Transformer architecture, namely T4PdM.
T4PdM was able to achieve an overall accuracy of 99.98% and 98% for both datasets.
It has demonstrated the superiority of the model in detecting and classifying faults in rotating industrial machinery.
arXiv Detail & Related papers (2022-04-07T20:31:45Z) - Greenformers: Improving Computation and Memory Efficiency in Transformer
Models via Low-Rank Approximation [3.3576886095389296]
We introduce Greenformers, a collection of model efficiency methods to improve the model efficiency of transformer models.
We propose a low-rank factorization approach to improve the efficiency of the transformer model called Low-Rank Transformer.
We show that Low-Rank Transformer is more suitable for on-device deployment, as it significantly reduces the model size.
arXiv Detail & Related papers (2021-08-24T15:51:40Z) - Efficient pre-training objectives for Transformers [84.64393460397471]
We study several efficient pre-training objectives for Transformers-based models.
We prove that eliminating the MASK token and considering the whole output during the loss are essential choices to improve performance.
arXiv Detail & Related papers (2021-04-20T00:09:37Z) - Pre-trained Summarization Distillation [121.14806854092672]
Recent work on distilling BERT for classification and regression tasks shows strong performance using direct knowledge distillation.
Alternatively, machine translation practitioners distill using pseudo-labeling, where a small model is trained on the translations of a larger model.
A third, simpler approach is to'shrink and fine-tune' (SFT), which avoids any explicit distillation by copying parameters to a smaller student model and then fine-tuning.
arXiv Detail & Related papers (2020-10-24T23:15:43Z) - The Cascade Transformer: an Application for Efficient Answer Sentence
Selection [116.09532365093659]
We introduce the Cascade Transformer, a technique to adapt transformer-based models into a cascade of rankers.
When compared to a state-of-the-art transformer model, our approach reduces computation by 37% with almost no impact on accuracy.
arXiv Detail & Related papers (2020-05-05T23:32:01Z) - Compressing Large-Scale Transformer-Based Models: A Case Study on BERT [41.04066537294312]
Pre-trained Transformer-based models have achieved state-of-the-art performance for various Natural Language Processing (NLP) tasks.
These models often have billions of parameters, and, thus, are too resource-hungry and computation-intensive to suit low-capability devices or applications.
One potential remedy for this is model compression, which has attracted a lot of research attention.
arXiv Detail & Related papers (2020-02-27T09:20:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.