BERT Loses Patience: Fast and Robust Inference with Early Exit
- URL: http://arxiv.org/abs/2006.04152v3
- Date: Thu, 22 Oct 2020 06:37:36 GMT
- Title: BERT Loses Patience: Fast and Robust Inference with Early Exit
- Authors: Wangchunshu Zhou and Canwen Xu and Tao Ge and Julian McAuley and Ke Xu
and Furu Wei
- Abstract summary: We propose Patience-based Early Exit as a plug-and-play technique to improve the efficiency and robustness of a pretrained language model.
Our approach improves inference efficiency as it allows the model to make a prediction with fewer layers.
- Score: 91.26199404912019
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose Patience-based Early Exit, a straightforward yet
effective inference method that can be used as a plug-and-play technique to
simultaneously improve the efficiency and robustness of a pretrained language
model (PLM). To achieve this, our approach couples an internal-classifier with
each layer of a PLM and dynamically stops inference when the intermediate
predictions of the internal classifiers remain unchanged for a pre-defined
number of steps. Our approach improves inference efficiency as it allows the
model to make a prediction with fewer layers. Meanwhile, experimental results
with an ALBERT model show that our method can improve the accuracy and
robustness of the model by preventing it from overthinking and exploiting
multiple classifiers for prediction, yielding a better accuracy-speed trade-off
compared to existing early exit methods.
Related papers
- Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation.
In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales.
Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z) - DE$^3$-BERT: Distance-Enhanced Early Exiting for BERT based on
Prototypical Networks [43.967626080432275]
We propose a novel Distance-Enhanced Early Exiting framework for BERT (DE$3$-BERT)
We implement a hybrid exiting strategy that supplements classic entropy-based local information with distance-based global information.
Experiments on the GLUE benchmark demonstrate that DE$3$-BERT consistently outperforms state-of-the-art models.
arXiv Detail & Related papers (2024-02-03T15:51:17Z) - Observation-Guided Diffusion Probabilistic Models [41.749374023639156]
We propose a novel diffusion-based image generation method called the observation-guided diffusion probabilistic model (OGDM)
Our approach reestablishes the training objective by integrating the guidance of the observation process with the Markov chain.
We demonstrate the effectiveness of our training algorithm using diverse inference techniques on strong diffusion model baselines.
arXiv Detail & Related papers (2023-10-06T06:29:06Z) - Consensus-Adaptive RANSAC [104.87576373187426]
We propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer.
The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer.
arXiv Detail & Related papers (2023-07-26T08:25:46Z) - Dynamic Transformers Provide a False Sense of Efficiency [75.39702559746533]
Multi-exit models make a trade-off between efficiency and accuracy, where the saving of computation comes from an early exit.
We propose a simple yet effective attacking framework, SAME, which is specially tailored to reduce the efficiency of the multi-exit models.
Experiments on the GLUE benchmark show that SAME can effectively diminish the efficiency gain of various multi-exit models by 80% on average.
arXiv Detail & Related papers (2023-05-20T16:41:48Z) - Consistent Accelerated Inference via Confident Adaptive Transformers [29.034390810078172]
We develop a novel approach for confidently accelerating inference in the large and expensive multilayer Transformers.
We simultaneously increase computational efficiency, while guaranteeing a specifiable degree of consistency with the original model with high confidence.
We demonstrate the effectiveness of this approach on four classification and regression tasks.
arXiv Detail & Related papers (2021-04-18T10:22:28Z) - Accelerating Pre-trained Language Models via Calibrated Cascade [37.00619245086208]
We analyze the working mechanism of dynamic early exiting and find it cannot achieve a satisfying trade-off between inference speed and performance.
We propose CascadeBERT, which dynamically selects a proper-sized, complete model in a cascading manner.
arXiv Detail & Related papers (2020-12-29T09:43:50Z) - Joint Stochastic Approximation and Its Application to Learning Discrete
Latent Variable Models [19.07718284287928]
We show that the difficulty of obtaining reliable gradients for the inference model and the drawback of indirectly optimizing the target log-likelihood can be gracefully addressed.
We propose to directly maximize the target log-likelihood and simultaneously minimize the inclusive divergence between the posterior and the inference model.
The resulting learning algorithm is called joint SA (JSA)
arXiv Detail & Related papers (2020-05-28T13:50:08Z) - Efficient Ensemble Model Generation for Uncertainty Estimation with
Bayesian Approximation in Segmentation [74.06904875527556]
We propose a generic and efficient segmentation framework to construct ensemble segmentation models.
In the proposed method, ensemble models can be efficiently generated by using the layer selection method.
We also devise a new pixel-wise uncertainty loss, which improves the predictive performance.
arXiv Detail & Related papers (2020-05-21T16:08:38Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.