Elbert: Fast Albert with Confidence-Window Based Early Exit
- URL: http://arxiv.org/abs/2107.00175v1
- Date: Thu, 1 Jul 2021 02:02:39 GMT
- Title: Elbert: Fast Albert with Confidence-Window Based Early Exit
- Authors: Keli Xie, Siyuan Lu, Meiqi Wang, Zhongfeng Wang
- Abstract summary: Large pre-trained language models like BERT are not well-suited for resource-constrained or real-time applications.
We propose the ELBERT, which significantly improves the average inference speed compared to ALBERT due to the proposed confidence-window based early exit mechanism.
- Score: 8.956309416589232
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the great success in Natural Language Processing (NLP) area, large
pre-trained language models like BERT are not well-suited for
resource-constrained or real-time applications owing to the large number of
parameters and slow inference speed. Recently, compressing and accelerating
BERT have become important topics. By incorporating a parameter-sharing
strategy, ALBERT greatly reduces the number of parameters while achieving
competitive performance. Nevertheless, ALBERT still suffers from a long
inference time. In this work, we propose the ELBERT, which significantly
improves the average inference speed compared to ALBERT due to the proposed
confidence-window based early exit mechanism, without introducing additional
parameters or extra training overhead. Experimental results show that ELBERT
achieves an adaptive inference speedup varying from 2$\times$ to 10$\times$
with negligible accuracy degradation compared to ALBERT on various datasets.
Besides, ELBERT achieves higher accuracy than existing early exit methods used
for accelerating BERT under the same computation cost. Furthermore, to
understand the principle of the early exit mechanism, we also visualize the
decision-making process of it in ELBERT.
Related papers
- CEEBERT: Cross-Domain Inference in Early Exit BERT [5.402030962296633]
CeeBERT learns optimal thresholds from domain-specific confidence observed at intermediate layers on the fly.
CeeBERT can speed up the BERT/ALBERT models by $2times$ - $3.5times$ with minimal drop in accuracy.
arXiv Detail & Related papers (2024-05-23T20:36:10Z) - oBERTa: Improving Sparse Transfer Learning via improved initialization,
distillation, and pruning regimes [82.99830498937729]
oBERTa is an easy-to-use set of language models for Natural Language Processing.
It allows NLP practitioners to obtain between 3.8 and 24.3 times faster models without expertise in model compression.
We explore the use of oBERTa on seven representative NLP tasks.
arXiv Detail & Related papers (2023-03-30T01:37:19Z) - SmartBERT: A Promotion of Dynamic Early Exiting Mechanism for
Accelerating BERT Inference [18.456002674399244]
We propose a novel dynamic early exiting combined with layer skipping for BERT inference named SmartBERT.
SmartBERT can adaptively skip some layers and adaptively choose whether to exit.
We conduct experiments on eight classification datasets of the GLUE benchmark.
arXiv Detail & Related papers (2023-03-16T12:44:16Z) - HuBERT-EE: Early Exiting HuBERT for Efficient Speech Recognition [11.243855639847514]
We introduce an early exit scheme for ASR, namely HuBERT-EE, that allows the model to stop the inference dynamically.
Experimental results on the LibriSpeech show that HuBERT-EE can accelerate the inference of the HuBERT while simultaneously balancing the trade-off between the performance and the latency.
arXiv Detail & Related papers (2022-04-13T12:11:44Z) - BiBERT: Accurate Fully Binarized BERT [69.35727280997617]
BiBERT is an accurate fully binarized BERT to eliminate the performance bottlenecks.
Our method yields impressive 56.3 times and 31.2 times saving on FLOPs and model size.
arXiv Detail & Related papers (2022-03-12T09:46:13Z) - TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference [54.791572981834435]
Existing pre-trained language models (PLMs) are often computationally expensive in inference.
We propose a dynamic token reduction approach to accelerate PLMs' inference, named TR-BERT.
TR-BERT formulates the token reduction process as a multi-step token selection problem and automatically learns the selection strategy via reinforcement learning.
arXiv Detail & Related papers (2021-05-25T02:28:51Z) - TernaryBERT: Distillation-aware Ultra-low Bit BERT [53.06741585060951]
We propose TernaryBERT, which ternarizes the weights in a fine-tuned BERT model.
Experiments on the GLUE benchmark and SQuAD show that our proposed TernaryBERT outperforms the other BERT quantization methods.
arXiv Detail & Related papers (2020-09-27T10:17:28Z) - BERT Loses Patience: Fast and Robust Inference with Early Exit [91.26199404912019]
We propose Patience-based Early Exit as a plug-and-play technique to improve the efficiency and robustness of a pretrained language model.
Our approach improves inference efficiency as it allows the model to make a prediction with fewer layers.
arXiv Detail & Related papers (2020-06-07T13:38:32Z) - DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference [69.93692147242284]
Large-scale pre-trained language models such as BERT have brought significant improvements to NLP applications.
We propose a simple but effective method, DeeBERT, to accelerate BERT inference.
Experiments show that DeeBERT is able to save up to 40% inference time with minimal degradation in model quality.
arXiv Detail & Related papers (2020-04-27T17:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.