Accelerating BERT Inference for Sequence Labeling via Early-Exit
- URL: http://arxiv.org/abs/2105.13878v1
- Date: Fri, 28 May 2021 14:39:26 GMT
- Title: Accelerating BERT Inference for Sequence Labeling via Early-Exit
- Authors: Xiaonan Li, Yunfan Shao, Tianxiang Sun, Hang Yan, Xipeng Qiu, Xuanjing
Huang
- Abstract summary: We extend the recent successful early-exit mechanism to accelerate the inference of PTMs for sequence labeling tasks.
We also propose a token-level early-exit mechanism that allows partial tokens to exit early at different layers.
Our approach can save up to 66%-75% inference cost with minimal performance degradation.
- Score: 65.7292767360083
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Both performance and efficiency are crucial factors for sequence labeling
tasks in many real-world scenarios. Although the pre-trained models (PTMs) have
significantly improved the performance of various sequence labeling tasks,
their computational cost is expensive. To alleviate this problem, we extend the
recent successful early-exit mechanism to accelerate the inference of PTMs for
sequence labeling tasks. However, existing early-exit mechanisms are
specifically designed for sequence-level tasks, rather than sequence labeling.
In this paper, we first propose a simple extension of sentence-level early-exit
for sequence labeling tasks. To further reduce the computational cost, we also
propose a token-level early-exit mechanism that allows partial tokens to exit
early at different layers. Considering the local dependency inherent in
sequence labeling, we employed a window-based criterion to decide for a token
whether or not to exit. The token-level early-exit brings the gap between
training and inference, so we introduce an extra self-sampling fine-tuning
stage to alleviate it. The extensive experiments on three popular sequence
labeling tasks show that our approach can save up to 66%-75% inference cost
with minimal performance degradation. Compared with competitive compressed
models such as DistilBERT, our approach can achieve better performance under
the same speed-up ratios of 2X, 3X, and 4X.
Related papers
- Incremental Self-training for Semi-supervised Learning [56.57057576885672]
IST is simple yet effective and fits existing self-training-based semi-supervised learning methods.
We verify the proposed IST on five datasets and two types of backbone, effectively improving the recognition accuracy and learning speed.
arXiv Detail & Related papers (2024-04-14T05:02:00Z) - Label Delay in Online Continual Learning [77.05325581370893]
A critical aspect often overlooked is the label delay, where new data may not be labeled due to slow and costly annotation processes.
We introduce a new continual learning framework with explicit modeling of the label delay between data and label streams over time steps.
We show experimentally that our method is the least affected by the label delay factor and in some cases successfully recovers the accuracy of the non-delayed counterpart.
arXiv Detail & Related papers (2023-12-01T20:52:10Z) - Unifying Token and Span Level Supervisions for Few-Shot Sequence
Labeling [18.24907067631541]
Few-shot sequence labeling aims to identify novel classes based on only a few labeled samples.
We propose a Consistent Dual Adaptive Prototypical (CDAP) network for few-shot sequence labeling.
Our model achieves new state-of-the-art results on three benchmark datasets.
arXiv Detail & Related papers (2023-07-16T04:50:52Z) - SkipDecode: Autoregressive Skip Decoding with Batching and Caching for
Efficient LLM Inference [17.947904697850433]
We present SkipDecode, a token-level early exit method for batch inferencing and KeyValue caching.
It overcomes prior constraints by setting up singular-level exit point for every token in a batch at each sequence position.
It also guarantees a monotonic decrease in exit points, thereby eliminating the need to recompute KV Caches for preceding tokens.
arXiv Detail & Related papers (2023-07-05T19:59:09Z) - LESS: Label-Efficient Semantic Segmentation for LiDAR Point Clouds [62.49198183539889]
We propose a label-efficient semantic segmentation pipeline for outdoor scenes with LiDAR point clouds.
Our method co-designs an efficient labeling process with semi/weakly supervised learning.
Our proposed method is even highly competitive compared to the fully supervised counterpart with 100% labels.
arXiv Detail & Related papers (2022-10-14T19:13:36Z) - Modeling sequential annotations for sequence labeling with crowds [8.239028141030621]
Crowd sequential annotations can be an efficient and cost-effective way to build large datasets for sequence labeling.
We propose Modeling sequential annotation for sequence labeling with crowds (SA-SLC)
A valid label sequence inference (VLSE) method is proposed to derive the valid ground-truth label sequences from crowd sequential annotations.
arXiv Detail & Related papers (2022-09-20T02:51:23Z) - Pyramid-BERT: Reducing Complexity via Successive Core-set based Token
Selection [23.39962989492527]
Transformer-based language models such as BERT have achieved the state-of-the-art on various NLP tasks, but are computationally prohibitive.
We present Pyramid-BERT where we replace previously useds with a em core-set based token selection method justified by theoretical results.
The core-set based token selection technique allows us to avoid expensive pre-training, gives a space-efficient fine tuning, and thus makes it suitable to handle longer sequence lengths.
arXiv Detail & Related papers (2022-03-27T19:52:01Z) - Uncertainty-Aware Label Refinement for Sequence Labeling [47.67853514765981]
We introduce a novel two-stage label decoding framework to model long-term label dependencies.
A base model first predicts draft labels, and then a novel two-stream self-attention model makes refinements on these draft predictions.
arXiv Detail & Related papers (2020-12-19T06:56:59Z) - Semantic Label Smoothing for Sequence to Sequence Problems [54.758974840974425]
We propose a technique that smooths over emphwell formed relevant sequences that have sufficient n-gram overlap with the target sequence.
Our method shows a consistent and significant improvement over the state-of-the-art techniques on different datasets.
arXiv Detail & Related papers (2020-10-15T00:31:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.