Related papers: BEBERT: Efficient and robust binary ensemble BERT

BEBERT: Efficient and robust binary ensemble BERT

URL: http://arxiv.org/abs/2210.15976v1
Date: Fri, 28 Oct 2022 08:15:26 GMT
Title: BEBERT: Efficient and robust binary ensemble BERT
Authors: Jiayi Tian, Chao Fang, Haonan Wang and Zhongfeng Wang
Abstract summary: Binarization of pre-trained BERT models can alleviate this issue but comes with a severe accuracy drop compared with their full-precision counterparts. We propose an efficient and robust binary ensemble BERT (BEBERT) to bridge the accuracy gap.
Score: 12.109371576500928
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pre-trained BERT models have achieved impressive accuracy on natural language processing (NLP) tasks. However, their excessive amount of parameters hinders them from efficient deployment on edge devices. Binarization of the BERT models can significantly alleviate this issue but comes with a severe accuracy drop compared with their full-precision counterparts. In this paper, we propose an efficient and robust binary ensemble BERT (BEBERT) to bridge the accuracy gap. To the best of our knowledge, this is the first work employing ensemble techniques on binary BERTs, yielding BEBERT, which achieves superior accuracy while retaining computational efficiency. Furthermore, we remove the knowledge distillation procedures during ensemble to speed up the training process without compromising accuracy. Experimental results on the GLUE benchmark show that the proposed BEBERT significantly outperforms the existing binary BERT models in accuracy and robustness with a 2x speedup on training time. Moreover, our BEBERT has only a negligible accuracy loss of 0.3% compared to the full-precision baseline while saving 15x and 13x in FLOPs and model size, respectively. In addition, BEBERT also outperforms other compressed BERTs in accuracy by up to 6.7%.

Related papers

SplaXBERT: Leveraging Mixed Precision Training and Context Splitting for Question Answering [0.0]
Built on ALBERT-xlarge with context-splitting and mixed precision training, SplaXBERT achieves high efficiency in question-answering tasks on lengthy texts. Tested on SQuAD v1.1, it attains an Exact Match of 85.95% and an F1 Score of 92.97%.
arXiv Detail & Related papers (2024-12-07T02:01:27Z)
oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes [82.99830498937729]
oBERTa is an easy-to-use set of language models for Natural Language Processing. It allows NLP practitioners to obtain between 3.8 and 24.3 times faster models without expertise in model compression. We explore the use of oBERTa on seven representative NLP tasks.
arXiv Detail & Related papers (2023-03-30T01:37:19Z)
Efficient Uncertainty Estimation with Gaussian Process for Reliable Dialog Response Retrieval [31.32746943236811]
We propose an efficient uncertainty calibration framework GPF-BERT for BERT-based conversational search. In comparison with basic calibration methods, GPF-BERT achieves the lowest empirical calibration error (ECE) in three in-domain datasets. In terms of time consumption, our GPF-BERT has an 8$times$ speedup.
arXiv Detail & Related papers (2023-03-15T13:12:16Z)
BiBERT: Accurate Fully Binarized BERT [69.35727280997617]
BiBERT is an accurate fully binarized BERT to eliminate the performance bottlenecks. Our method yields impressive 56.3 times and 31.2 times saving on FLOPs and model size.
arXiv Detail & Related papers (2022-03-12T09:46:13Z)
BinaryBERT: Pushing the Limit of BERT Quantization [74.65543496761553]
We propose BinaryBERT, which pushes BERT quantization to the limit with weight binarization. We find that a binary BERT is hard to be trained directly than a ternary counterpart due to its complex and irregular loss landscapes. Empirical results show that BinaryBERT has negligible performance drop compared to the full-precision BERT-base.
arXiv Detail & Related papers (2020-12-31T16:34:54Z)
TernaryBERT: Distillation-aware Ultra-low Bit BERT [53.06741585060951]
We propose TernaryBERT, which ternarizes the weights in a fine-tuned BERT model. Experiments on the GLUE benchmark and SQuAD show that our proposed TernaryBERT outperforms the other BERT quantization methods.
arXiv Detail & Related papers (2020-09-27T10:17:28Z)
DeBERTa: Decoding-enhanced BERT with Disentangled Attention [119.77305080520718]
We propose a new model architecture DeBERTa that improves the BERT and RoBERTa models using two novel techniques. We show that these techniques significantly improve the efficiency of model pre-training and the performance of both natural language understanding (NLU) and natural langauge generation (NLG) downstream tasks.
arXiv Detail & Related papers (2020-06-05T19:54:34Z)
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference [69.93692147242284]
Large-scale pre-trained language models such as BERT have brought significant improvements to NLP applications. We propose a simple but effective method, DeeBERT, to accelerate BERT inference. Experiments show that DeeBERT is able to save up to 40% inference time with minimal degradation in model quality.
arXiv Detail & Related papers (2020-04-27T17:58:05Z)
PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination [4.965114253725414]
We develop a novel method, called PoWER-BERT, for improving the inference time of the popular BERT model. We demonstrate that our method attains up to 6.8x reduction in inference time with 1% loss in accuracy when applied over ALBERT.
arXiv Detail & Related papers (2020-01-24T11:36:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.