BEBERT: Efficient and robust binary ensemble BERT
- URL: http://arxiv.org/abs/2210.15976v1
- Date: Fri, 28 Oct 2022 08:15:26 GMT
- Title: BEBERT: Efficient and robust binary ensemble BERT
- Authors: Jiayi Tian, Chao Fang, Haonan Wang and Zhongfeng Wang
- Abstract summary: Binarization of pre-trained BERT models can alleviate this issue but comes with a severe accuracy drop compared with their full-precision counterparts.
We propose an efficient and robust binary ensemble BERT (BEBERT) to bridge the accuracy gap.
- Score: 12.109371576500928
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained BERT models have achieved impressive accuracy on natural language
processing (NLP) tasks. However, their excessive amount of parameters hinders
them from efficient deployment on edge devices. Binarization of the BERT models
can significantly alleviate this issue but comes with a severe accuracy drop
compared with their full-precision counterparts. In this paper, we propose an
efficient and robust binary ensemble BERT (BEBERT) to bridge the accuracy gap.
To the best of our knowledge, this is the first work employing ensemble
techniques on binary BERTs, yielding BEBERT, which achieves superior accuracy
while retaining computational efficiency. Furthermore, we remove the knowledge
distillation procedures during ensemble to speed up the training process
without compromising accuracy. Experimental results on the GLUE benchmark show
that the proposed BEBERT significantly outperforms the existing binary BERT
models in accuracy and robustness with a 2x speedup on training time. Moreover,
our BEBERT has only a negligible accuracy loss of 0.3% compared to the
full-precision baseline while saving 15x and 13x in FLOPs and model size,
respectively. In addition, BEBERT also outperforms other compressed BERTs in
accuracy by up to 6.7%.
Related papers
- oBERTa: Improving Sparse Transfer Learning via improved initialization,
distillation, and pruning regimes [82.99830498937729]
oBERTa is an easy-to-use set of language models for Natural Language Processing.
It allows NLP practitioners to obtain between 3.8 and 24.3 times faster models without expertise in model compression.
We explore the use of oBERTa on seven representative NLP tasks.
arXiv Detail & Related papers (2023-03-30T01:37:19Z) - Efficient Uncertainty Estimation with Gaussian Process for Reliable
Dialog Response Retrieval [31.32746943236811]
We propose an efficient uncertainty calibration framework GPF-BERT for BERT-based conversational search.
In comparison with basic calibration methods, GPF-BERT achieves the lowest empirical calibration error (ECE) in three in-domain datasets.
In terms of time consumption, our GPF-BERT has an 8$times$ speedup.
arXiv Detail & Related papers (2023-03-15T13:12:16Z) - BiBERT: Accurate Fully Binarized BERT [69.35727280997617]
BiBERT is an accurate fully binarized BERT to eliminate the performance bottlenecks.
Our method yields impressive 56.3 times and 31.2 times saving on FLOPs and model size.
arXiv Detail & Related papers (2022-03-12T09:46:13Z) - BinaryBERT: Pushing the Limit of BERT Quantization [74.65543496761553]
We propose BinaryBERT, which pushes BERT quantization to the limit with weight binarization.
We find that a binary BERT is hard to be trained directly than a ternary counterpart due to its complex and irregular loss landscapes.
Empirical results show that BinaryBERT has negligible performance drop compared to the full-precision BERT-base.
arXiv Detail & Related papers (2020-12-31T16:34:54Z) - TernaryBERT: Distillation-aware Ultra-low Bit BERT [53.06741585060951]
We propose TernaryBERT, which ternarizes the weights in a fine-tuned BERT model.
Experiments on the GLUE benchmark and SQuAD show that our proposed TernaryBERT outperforms the other BERT quantization methods.
arXiv Detail & Related papers (2020-09-27T10:17:28Z) - DeBERTa: Decoding-enhanced BERT with Disentangled Attention [119.77305080520718]
We propose a new model architecture DeBERTa that improves the BERT and RoBERTa models using two novel techniques.
We show that these techniques significantly improve the efficiency of model pre-training and the performance of both natural language understanding (NLU) and natural langauge generation (NLG) downstream tasks.
arXiv Detail & Related papers (2020-06-05T19:54:34Z) - DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference [69.93692147242284]
Large-scale pre-trained language models such as BERT have brought significant improvements to NLP applications.
We propose a simple but effective method, DeeBERT, to accelerate BERT inference.
Experiments show that DeeBERT is able to save up to 40% inference time with minimal degradation in model quality.
arXiv Detail & Related papers (2020-04-27T17:58:05Z) - PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector
Elimination [4.965114253725414]
We develop a novel method, called PoWER-BERT, for improving the inference time of the popular BERT model.
We demonstrate that our method attains up to 6.8x reduction in inference time with 1% loss in accuracy when applied over ALBERT.
arXiv Detail & Related papers (2020-01-24T11:36:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.