TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference
- URL: http://arxiv.org/abs/2105.11618v1
- Date: Tue, 25 May 2021 02:28:51 GMT
- Title: TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference
- Authors: Deming Ye, Yankai Lin, Yufei Huang, Maosong Sun
- Abstract summary: Existing pre-trained language models (PLMs) are often computationally expensive in inference.
We propose a dynamic token reduction approach to accelerate PLMs' inference, named TR-BERT.
TR-BERT formulates the token reduction process as a multi-step token selection problem and automatically learns the selection strategy via reinforcement learning.
- Score: 54.791572981834435
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing pre-trained language models (PLMs) are often computationally
expensive in inference, making them impractical in various resource-limited
real-world applications. To address this issue, we propose a dynamic token
reduction approach to accelerate PLMs' inference, named TR-BERT, which could
flexibly adapt the layer number of each token in inference to avoid redundant
calculation. Specially, TR-BERT formulates the token reduction process as a
multi-step token selection problem and automatically learns the selection
strategy via reinforcement learning. The experimental results on several
downstream NLP tasks show that TR-BERT is able to speed up BERT by 2-5 times to
satisfy various performance demands. Moreover, TR-BERT can also achieve better
performance with less computation in a suite of long-text tasks since its
token-level layer number adaption greatly accelerates the self-attention
operation in PLMs. The source code and experiment details of this paper can be
obtained from https://github.com/thunlp/TR-BERT.
Related papers
- CEEBERT: Cross-Domain Inference in Early Exit BERT [5.402030962296633]
CeeBERT learns optimal thresholds from domain-specific confidence observed at intermediate layers on the fly.
CeeBERT can speed up the BERT/ALBERT models by $2times$ - $3.5times$ with minimal drop in accuracy.
arXiv Detail & Related papers (2024-05-23T20:36:10Z) - MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for
Accelerating Vision-Language Transformer [66.71930982549028]
Vision-Language Transformers (VLTs) have shown great success recently, but are accompanied by heavy computation costs.
We propose a novel framework named Multimodal Alignment-Guided Dynamic Token Pruning (MADTP) for accelerating various VLTs.
arXiv Detail & Related papers (2024-03-05T14:13:50Z) - Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control [66.78146440275093]
Learned retrieval (LSR) is a family of neural methods that encode queries and documents into sparse lexical vectors.
We explore the application of LSR to the multi-modal domain, with a focus on text-image retrieval.
Current approaches like LexLIP and STAIR require complex multi-step training on massive datasets.
Our proposed approach efficiently transforms dense vectors from a frozen dense model into sparse lexical vectors.
arXiv Detail & Related papers (2024-02-27T14:21:56Z) - Breaking the Token Barrier: Chunking and Convolution for Efficient Long
Text Classification with BERT [0.0]
Transformer-based models, specifically BERT, have propelled research in various NLP tasks.
BERT models are limited to a maximum token limit of 512 tokens. Consequently, this makes it non-trivial to apply it in a practical setting with long input.
We propose a relatively simple extension to vanilla BERT architecture called ChunkBERT that allows finetuning of any pretrained models to perform inference on arbitrarily long text.
arXiv Detail & Related papers (2023-10-31T15:41:08Z) - Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks.
We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z) - SmartBERT: A Promotion of Dynamic Early Exiting Mechanism for
Accelerating BERT Inference [18.456002674399244]
We propose a novel dynamic early exiting combined with layer skipping for BERT inference named SmartBERT.
SmartBERT can adaptively skip some layers and adaptively choose whether to exit.
We conduct experiments on eight classification datasets of the GLUE benchmark.
arXiv Detail & Related papers (2023-03-16T12:44:16Z) - Dynamic-TinyBERT: Boost TinyBERT's Inference Efficiency by Dynamic
Sequence Length [2.8770761243361593]
TinyBERT addresses the computational efficiency by self-distilling BERT into a smaller transformer representation.
Dynamic-TinyBERT is trained only once, performing on-par with BERT and achieving an accuracy-speedup trade-off superior to any other efficient approaches.
arXiv Detail & Related papers (2021-11-18T11:58:19Z) - Learning to Ask Conversational Questions by Optimizing Levenshtein
Distance [83.53855889592734]
We introduce a Reinforcement Iterative Sequence Editing (RISE) framework that optimize the minimum Levenshtein distance (MLD) through explicit editing actions.
RISE is able to pay attention to tokens that are related to conversational characteristics.
Experimental results on two benchmark datasets show that RISE significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-06-30T08:44:19Z) - DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference [69.93692147242284]
Large-scale pre-trained language models such as BERT have brought significant improvements to NLP applications.
We propose a simple but effective method, DeeBERT, to accelerate BERT inference.
Experiments show that DeeBERT is able to save up to 40% inference time with minimal degradation in model quality.
arXiv Detail & Related papers (2020-04-27T17:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.