Related papers: FastBERT: a Self-distilling BERT with Adaptive Inference Time

FastBERT: a Self-distilling BERT with Adaptive Inference Time

URL: http://arxiv.org/abs/2004.02178v2
Date: Wed, 29 Apr 2020 15:46:34 GMT
Title: FastBERT: a Self-distilling BERT with Adaptive Inference Time
Authors: Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Haotang Deng, Qi Ju
Abstract summary: Our model achieves promising results in twelve English and Chinese datasets. It is able to speed up by a wide range from 1 to 12 times than BERT if given different speedup thresholds.
Score: 14.207970599841218
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pre-trained language models like BERT have proven to be highly performant. However, they are often computationally expensive in many practical scenarios, for such heavy models can hardly be readily implemented with limited resources. To improve their efficiency with an assured model performance, we propose a novel speed-tunable FastBERT with adaptive inference time. The speed at inference can be flexibly adjusted under varying demands, while redundant calculation of samples is avoided. Moreover, this model adopts a unique self-distillation mechanism at fine-tuning, further enabling a greater computational efficacy with minimal loss in performance. Our model achieves promising results in twelve English and Chinese datasets. It is able to speed up by a wide range from 1 to 12 times than BERT if given different speedup thresholds to make a speed-performance tradeoff.

Related papers

Sensi-BERT: Towards Sensitivity Driven Fine-Tuning for Parameter-Efficient BERT [6.029590006321152]
We present Sensi-BERT, a sensitivity driven efficient fine-tuning of BERT models for downstream tasks. Our experiments show the efficacy of Sensi-BERT across different downstream tasks including MNLI, QQP, QNLI, SST-2 and SQuAD.
arXiv Detail & Related papers (2023-07-14T17:24:15Z)
Improving Non-autoregressive Translation Quality with Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC [51.34222224728979]
This paper introduces a series of innovative techniques to enhance the translation quality of Non-Autoregressive Translation (NAT) models. We propose fine-tuning Pretrained Multilingual Language Models (PMLMs) with the CTC loss to train NAT models effectively. Our model exhibits a remarkable speed improvement of 16.35 times compared to the autoregressive model.
arXiv Detail & Related papers (2023-06-10T05:24:29Z)
Dynamic Transformers Provide a False Sense of Efficiency [75.39702559746533]
Multi-exit models make a trade-off between efficiency and accuracy, where the saving of computation comes from an early exit. We propose a simple yet effective attacking framework, SAME, which is specially tailored to reduce the efficiency of the multi-exit models. Experiments on the GLUE benchmark show that SAME can effectively diminish the efficiency gain of various multi-exit models by 80% on average.
arXiv Detail & Related papers (2023-05-20T16:41:48Z)
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation [68.30497162547768]
We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed. We validate the efficiency and effectiveness of MoEBERT on natural language understanding and question answering tasks.
arXiv Detail & Related papers (2022-04-15T23:19:37Z)
TangoBERT: Reducing Inference Cost by using Cascaded Architecture [9.496399437260678]
We present TangoBERT, a cascaded model architecture in which instances are first processed by an efficient but less accurate first tier model. The decision of whether to apply the second tier model is based on a confidence score produced by the first tier model. We report TangoBERT inference CPU speedup on four text classification GLUE tasks and on one reading comprehension task.
arXiv Detail & Related papers (2022-04-13T09:45:08Z)
DACT-BERT: Differentiable Adaptive Computation Time for an Efficient BERT Inference [3.375478015832455]
We propose DACT-BERT, a differentiable adaptive computation time strategy for BERT-like models. DACT-BERT adds an adaptive computational mechanism to BERT's regular processing pipeline, which controls the number of Transformer blocks that need to be executed at inference time. Our experiments demonstrate that our approach, when compared to the baselines, excels on a reduced computational regime and is competitive in other less restrictive ones.
arXiv Detail & Related papers (2021-09-24T04:45:55Z)
TernaryBERT: Distillation-aware Ultra-low Bit BERT [53.06741585060951]
We propose TernaryBERT, which ternarizes the weights in a fine-tuned BERT model. Experiments on the GLUE benchmark and SQuAD show that our proposed TernaryBERT outperforms the other BERT quantization methods.
arXiv Detail & Related papers (2020-09-27T10:17:28Z)
BERT Loses Patience: Fast and Robust Inference with Early Exit [91.26199404912019]
We propose Patience-based Early Exit as a plug-and-play technique to improve the efficiency and robustness of a pretrained language model. Our approach improves inference efficiency as it allows the model to make a prediction with fewer layers.
arXiv Detail & Related papers (2020-06-07T13:38:32Z)
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference [69.93692147242284]
Large-scale pre-trained language models such as BERT have brought significant improvements to NLP applications. We propose a simple but effective method, DeeBERT, to accelerate BERT inference. Experiments show that DeeBERT is able to save up to 40% inference time with minimal degradation in model quality.
arXiv Detail & Related papers (2020-04-27T17:58:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.