Related papers: DPBERT: Efficient Inference for BERT based on Dynamic Planning

DPBERT: Efficient Inference for BERT based on Dynamic Planning

URL: http://arxiv.org/abs/2308.00108v1
Date: Wed, 26 Jul 2023 07:18:50 GMT
Title: DPBERT: Efficient Inference for BERT based on Dynamic Planning
Authors: Weixin Wu and Hankz Hankui Zhuo
Abstract summary: Existing input-adaptive inference methods fail to take full advantage of the structure of BERT. We propose Dynamic Planning in BERT, a novel fine-tuning strategy that can accelerate the inference process of BERT. Our method reduces latency to 75% while maintaining 98% accuracy, yielding a better accuracy-speed trade-off compared to state-of-the-art input-adaptive methods.
Score: 11.680840266488884
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Large-scale pre-trained language models such as BERT have contributed significantly to the development of NLP. However, those models require large computational resources, making it difficult to be applied to mobile devices where computing power is limited. In this paper we aim to address the weakness of existing input-adaptive inference methods which fail to take full advantage of the structure of BERT. We propose Dynamic Planning in BERT, a novel fine-tuning strategy that can accelerate the inference process of BERT through selecting a subsequence of transformer layers list of backbone as a computational path for an input sample. To do this, our approach adds a planning module to the original BERT model to determine whether a layer is included or bypassed during inference. Experimental results on the GLUE benchmark exhibit that our method reduces latency to 75\% while maintaining 98\% accuracy, yielding a better accuracy-speed trade-off compared to state-of-the-art input-adaptive methods.

Related papers

Sparse is Enough in Fine-tuning Pre-trained Large Language Models [98.46493578509039]
We propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT) We validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning.
arXiv Detail & Related papers (2023-12-19T06:06:30Z)
Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks. We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z)
BiBERT: Accurate Fully Binarized BERT [69.35727280997617]
BiBERT is an accurate fully binarized BERT to eliminate the performance bottlenecks. Our method yields impressive 56.3 times and 31.2 times saving on FLOPs and model size.
arXiv Detail & Related papers (2022-03-12T09:46:13Z)
BERTVision -- A Parameter-Efficient Approach for Question Answering [0.0]
We present a highly parameter efficient approach for Question Answering that significantly reduces the need for extended BERT fine-tuning. Our method uses information from the hidden state activations of each BERT transformer layer, which is discarded during typical BERT inference. Our experiments show that this approach works well not only for span QA, but also for classification, suggesting that it may be to a wider range of tasks.
arXiv Detail & Related papers (2022-02-24T17:16:25Z)
DACT-BERT: Differentiable Adaptive Computation Time for an Efficient BERT Inference [3.375478015832455]
We propose DACT-BERT, a differentiable adaptive computation time strategy for BERT-like models. DACT-BERT adds an adaptive computational mechanism to BERT's regular processing pipeline, which controls the number of Transformer blocks that need to be executed at inference time. Our experiments demonstrate that our approach, when compared to the baselines, excels on a reduced computational regime and is competitive in other less restrictive ones.
arXiv Detail & Related papers (2021-09-24T04:45:55Z)
BERT-based Chinese Text Classification for Emergency Domain with a Novel Loss Function [9.028459232146474]
This paper proposes an automatic Chinese text categorization method for solving the emergency event report classification problem. To overcome the data imbalance problem in the distribution of emergency event categories, a novel loss function is proposed to improve the performance of the BERT-based model. The proposed method has achieved the best performance in terms of accuracy, weighted-precision, weighted-recall, and weighted-F1 values.
arXiv Detail & Related papers (2021-04-09T05:25:00Z)
Learning Dynamic BERT via Trainable Gate Variables and a Bi-modal Regularizer [36.74058297640735]
The BERT model has shown significant success on various natural language processing tasks. Due to the heavy model size and high computational cost, the model suffers from high latency, which is fatal to its deployments on resource-limited devices. We propose a dynamic inference method on BERT via trainable gate variables applied on input tokens and a regularizer that has a bi-modal property.
arXiv Detail & Related papers (2021-02-19T03:59:23Z)
TernaryBERT: Distillation-aware Ultra-low Bit BERT [53.06741585060951]
We propose TernaryBERT, which ternarizes the weights in a fine-tuned BERT model. Experiments on the GLUE benchmark and SQuAD show that our proposed TernaryBERT outperforms the other BERT quantization methods.
arXiv Detail & Related papers (2020-09-27T10:17:28Z)
BERT Loses Patience: Fast and Robust Inference with Early Exit [91.26199404912019]
We propose Patience-based Early Exit as a plug-and-play technique to improve the efficiency and robustness of a pretrained language model. Our approach improves inference efficiency as it allows the model to make a prediction with fewer layers.
arXiv Detail & Related papers (2020-06-07T13:38:32Z)
BERT-ATTACK: Adversarial Attack Against BERT Using BERT [77.82947768158132]
Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images) We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples. Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
arXiv Detail & Related papers (2020-04-21T13:30:02Z)
TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval [11.923682816611716]
We present TwinBERT model for effective and efficient retrieval. It has twin-structured BERT-like encoders to represent query and document respectively. It allows document embeddings to be pre-computed offline and cached in memory.
arXiv Detail & Related papers (2020-02-14T22:44:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.