DE$^3$-BERT: Distance-Enhanced Early Exiting for BERT based on
Prototypical Networks
- URL: http://arxiv.org/abs/2402.05948v1
- Date: Sat, 3 Feb 2024 15:51:17 GMT
- Title: DE$^3$-BERT: Distance-Enhanced Early Exiting for BERT based on
Prototypical Networks
- Authors: Jianing He, Qi Zhang, Weiping Ding, Duoqian Miao, Jun Zhao, Liang Hu,
Longbing Cao
- Abstract summary: We propose a novel Distance-Enhanced Early Exiting framework for BERT (DE$3$-BERT)
We implement a hybrid exiting strategy that supplements classic entropy-based local information with distance-based global information.
Experiments on the GLUE benchmark demonstrate that DE$3$-BERT consistently outperforms state-of-the-art models.
- Score: 43.967626080432275
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Early exiting has demonstrated its effectiveness in accelerating the
inference of pre-trained language models like BERT by dynamically adjusting the
number of layers executed. However, most existing early exiting methods only
consider local information from an individual test sample to determine their
exiting indicators, failing to leverage the global information offered by
sample population. This leads to suboptimal estimation of prediction
correctness, resulting in erroneous exiting decisions. To bridge the gap, we
explore the necessity of effectively combining both local and global
information to ensure reliable early exiting during inference. Purposefully, we
leverage prototypical networks to learn class prototypes and devise a distance
metric between samples and class prototypes. This enables us to utilize global
information for estimating the correctness of early predictions. On this basis,
we propose a novel Distance-Enhanced Early Exiting framework for BERT
(DE$^3$-BERT). DE$^3$-BERT implements a hybrid exiting strategy that
supplements classic entropy-based local information with distance-based global
information to enhance the estimation of prediction correctness for more
reliable early exiting decisions. Extensive experiments on the GLUE benchmark
demonstrate that DE$^3$-BERT consistently outperforms state-of-the-art models
under different speed-up ratios with minimal storage or computational overhead,
yielding a better trade-off between model performance and inference efficiency.
Additionally, an in-depth analysis further validates the generality and
interpretability of our method.
Related papers
- Editable Fairness: Fine-Grained Bias Mitigation in Language Models [52.66450426729818]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases.
FAST surpasses state-of-the-art baselines with superior debiasing performance.
This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Bayesian Optimization Meets Laplace Approximation for Robotic
Introspection [41.117361086267806]
We introduce a scalable Laplace Approximation (LA) technique to make Deep Neural Networks (DNNs) more introspective.
In particular, we propose a novel Bayesian Optimization (BO) algorithm to mitigate their tendency of under-fitting the true weight posterior.
We show that the proposed framework can be scaled up to large datasets and architectures.
arXiv Detail & Related papers (2020-10-30T09:28:10Z) - Robust Validation: Confident Predictions Even When Distributions Shift [19.327409270934474]
We describe procedures for robust predictive inference, where a model provides uncertainty estimates on its predictions rather than point predictions.
We present a method that produces prediction sets (almost exactly) giving the right coverage level for any test distribution in an $f$-divergence ball around the training population.
An essential component of our methodology is to estimate the amount of expected future data shift and build robustness to it.
arXiv Detail & Related papers (2020-08-10T17:09:16Z) - BERT Loses Patience: Fast and Robust Inference with Early Exit [91.26199404912019]
We propose Patience-based Early Exit as a plug-and-play technique to improve the efficiency and robustness of a pretrained language model.
Our approach improves inference efficiency as it allows the model to make a prediction with fewer layers.
arXiv Detail & Related papers (2020-06-07T13:38:32Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.