Related papers: CEEBERT: Cross-Domain Inference in Early Exit BERT

CEEBERT: Cross-Domain Inference in Early Exit BERT

URL: http://arxiv.org/abs/2405.15039v1
Date: Thu, 23 May 2024 20:36:10 GMT
Title: CEEBERT: Cross-Domain Inference in Early Exit BERT
Authors: Divya Jyoti Bajpai, Manjesh Kumar Hanawal,
Abstract summary: CeeBERT learns optimal thresholds from domain-specific confidence observed at intermediate layers on the fly. CeeBERT can speed up the BERT/ALBERT models by $2times$ - $3.5times$ with minimal drop in accuracy.
Score: 5.402030962296633
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pre-trained Language Models (PLMs), like BERT, with self-supervision objectives exhibit remarkable performance and generalization across various tasks. However, they suffer in inference latency due to their large size. To address this issue, side branches are attached at intermediate layers, enabling early inference of samples without requiring them to pass through all layers. However, the challenge is to decide which layer to infer and exit each sample so that the accuracy and latency are balanced. Moreover, the distribution of the samples to be inferred may differ from that used for training necessitating cross-domain adaptation. We propose an online learning algorithm named Cross-Domain Inference in Early Exit BERT (CeeBERT) that dynamically determines early exits of samples based on the level of confidence at each exit point. CeeBERT learns optimal thresholds from domain-specific confidence observed at intermediate layers on the fly, eliminating the need for labeled data. Experimental results on five distinct datasets with BERT and ALBERT models demonstrate CeeBERT's ability to improve latency by reducing unnecessary computations with minimal drop in performance. By adapting to the threshold values, CeeBERT can speed up the BERT/ALBERT models by $2\times$ - $3.5\times$ with minimal drop in accuracy.

Related papers

DE$^3$-BERT: Distance-Enhanced Early Exiting for BERT based on Prototypical Networks [43.967626080432275]
We propose a novel Distance-Enhanced Early Exiting framework for BERT (DE$3$-BERT) We implement a hybrid exiting strategy that supplements classic entropy-based local information with distance-based global information. Experiments on the GLUE benchmark demonstrate that DE$3$-BERT consistently outperforms state-of-the-art models.
arXiv Detail & Related papers (2024-02-03T15:51:17Z)
Breaking the Token Barrier: Chunking and Convolution for Efficient Long Text Classification with BERT [0.0]
Transformer-based models, specifically BERT, have propelled research in various NLP tasks. BERT models are limited to a maximum token limit of 512 tokens. Consequently, this makes it non-trivial to apply it in a practical setting with long input. We propose a relatively simple extension to vanilla BERT architecture called ChunkBERT that allows finetuning of any pretrained models to perform inference on arbitrarily long text.
arXiv Detail & Related papers (2023-10-31T15:41:08Z)
Towards Real-World Test-Time Adaptation: Tri-Net Self-Training with Balanced Normalization [52.03927261909813]
Existing works mainly consider real-world test-time adaptation under non-i.i.d. data stream and continual domain shift. We argue failure of state-of-the-art methods is first caused by indiscriminately adapting normalization layers to imbalanced testing data. The final TTA model, termed as TRIBE, is built upon a tri-net architecture with balanced batchnorm layers.
arXiv Detail & Related papers (2023-09-26T14:06:26Z)
SmartBERT: A Promotion of Dynamic Early Exiting Mechanism for Accelerating BERT Inference [18.456002674399244]
We propose a novel dynamic early exiting combined with layer skipping for BERT inference named SmartBERT. SmartBERT can adaptively skip some layers and adaptively choose whether to exit. We conduct experiments on eight classification datasets of the GLUE benchmark.
arXiv Detail & Related papers (2023-03-16T12:44:16Z)
Gradient-Free Structured Pruning with Unlabeled Data [57.999191898036706]
We propose a gradient-free structured pruning framework that uses only unlabeled data. Up to 40% of the original FLOP count can be reduced with less than a 4% accuracy loss across all tasks considered.
arXiv Detail & Related papers (2023-03-07T19:12:31Z)
Pretraining Without Attention [114.99187017618408]
This work explores pretraining without attention by using recent advances in sequence routing based on state-space models (SSMs) BiGS is able to match BERT pretraining accuracy on GLUE and can be extended to long-form pretraining of 4096 tokens without approximation.
arXiv Detail & Related papers (2022-12-20T18:50:08Z)
CAFA: Class-Aware Feature Alignment for Test-Time Adaptation [50.26963784271912]
Test-time adaptation (TTA) aims to address this challenge by adapting a model to unlabeled data at test time. We propose a simple yet effective feature alignment loss, termed as Class-Aware Feature Alignment (CAFA), which simultaneously encourages a model to learn target representations in a class-discriminative manner.
arXiv Detail & Related papers (2022-06-01T03:02:07Z)
TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference [54.791572981834435]
Existing pre-trained language models (PLMs) are often computationally expensive in inference. We propose a dynamic token reduction approach to accelerate PLMs' inference, named TR-BERT. TR-BERT formulates the token reduction process as a multi-step token selection problem and automatically learns the selection strategy via reinforcement learning.
arXiv Detail & Related papers (2021-05-25T02:28:51Z)
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference [69.93692147242284]
Large-scale pre-trained language models such as BERT have brought significant improvements to NLP applications. We propose a simple but effective method, DeeBERT, to accelerate BERT inference. Experiments show that DeeBERT is able to save up to 40% inference time with minimal degradation in model quality.
arXiv Detail & Related papers (2020-04-27T17:58:05Z)
The Right Tool for the Job: Matching Model and Instance Complexities [62.95183777679024]
As NLP models become larger, executing a trained model requires significant computational resources incurring monetary and environmental costs. We propose a modification to contextual representation fine-tuning which, during inference, allows for an early (and fast) "exit" We test our proposed modification on five different datasets in two tasks: three text classification datasets and two natural language inference benchmarks.
arXiv Detail & Related papers (2020-04-16T04:28:08Z)
TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval [11.923682816611716]
We present TwinBERT model for effective and efficient retrieval. It has twin-structured BERT-like encoders to represent query and document respectively. It allows document embeddings to be pre-computed offline and cached in memory.
arXiv Detail & Related papers (2020-02-14T22:44:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.