Related papers: Accelerating BERT Inference for Sequence Labeling via Early-Exit

Accelerating BERT Inference for Sequence Labeling via Early-Exit

URL: http://arxiv.org/abs/2105.13878v1
Date: Fri, 28 May 2021 14:39:26 GMT
Title: Accelerating BERT Inference for Sequence Labeling via Early-Exit
Authors: Xiaonan Li, Yunfan Shao, Tianxiang Sun, Hang Yan, Xipeng Qiu, Xuanjing Huang
Abstract summary: We extend the recent successful early-exit mechanism to accelerate the inference of PTMs for sequence labeling tasks. We also propose a token-level early-exit mechanism that allows partial tokens to exit early at different layers. Our approach can save up to 66%-75% inference cost with minimal performance degradation.
Score: 65.7292767360083
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Both performance and efficiency are crucial factors for sequence labeling tasks in many real-world scenarios. Although the pre-trained models (PTMs) have significantly improved the performance of various sequence labeling tasks, their computational cost is expensive. To alleviate this problem, we extend the recent successful early-exit mechanism to accelerate the inference of PTMs for sequence labeling tasks. However, existing early-exit mechanisms are specifically designed for sequence-level tasks, rather than sequence labeling. In this paper, we first propose a simple extension of sentence-level early-exit for sequence labeling tasks. To further reduce the computational cost, we also propose a token-level early-exit mechanism that allows partial tokens to exit early at different layers. Considering the local dependency inherent in sequence labeling, we employed a window-based criterion to decide for a token whether or not to exit. The token-level early-exit brings the gap between training and inference, so we introduce an extra self-sampling fine-tuning stage to alleviate it. The extensive experiments on three popular sequence labeling tasks show that our approach can save up to 66%-75% inference cost with minimal performance degradation. Compared with competitive compressed models such as DistilBERT, our approach can achieve better performance under the same speed-up ratios of 2X, 3X, and 4X.

Related papers

Incremental Self-training for Semi-supervised Learning [56.57057576885672]
IST is simple yet effective and fits existing self-training-based semi-supervised learning methods. We verify the proposed IST on five datasets and two types of backbone, effectively improving the recognition accuracy and learning speed.
arXiv Detail & Related papers (2024-04-14T05:02:00Z)
Label Delay in Online Continual Learning [77.05325581370893]
A critical aspect often overlooked is the label delay, where new data may not be labeled due to slow and costly annotation processes. We introduce a new continual learning framework with explicit modeling of the label delay between data and label streams over time steps. We show experimentally that our method is the least affected by the label delay factor and in some cases successfully recovers the accuracy of the non-delayed counterpart.
arXiv Detail & Related papers (2023-12-01T20:52:10Z)
Unifying Token and Span Level Supervisions for Few-Shot Sequence Labeling [18.24907067631541]
Few-shot sequence labeling aims to identify novel classes based on only a few labeled samples. We propose a Consistent Dual Adaptive Prototypical (CDAP) network for few-shot sequence labeling. Our model achieves new state-of-the-art results on three benchmark datasets.
arXiv Detail & Related papers (2023-07-16T04:50:52Z)
SkipDecode: Autoregressive Skip Decoding with Batching and Caching for Efficient LLM Inference [17.947904697850433]
We present SkipDecode, a token-level early exit method for batch inferencing and KeyValue caching. It overcomes prior constraints by setting up singular-level exit point for every token in a batch at each sequence position. It also guarantees a monotonic decrease in exit points, thereby eliminating the need to recompute KV Caches for preceding tokens.
arXiv Detail & Related papers (2023-07-05T19:59:09Z)
LESS: Label-Efficient Semantic Segmentation for LiDAR Point Clouds [62.49198183539889]
We propose a label-efficient semantic segmentation pipeline for outdoor scenes with LiDAR point clouds. Our method co-designs an efficient labeling process with semi/weakly supervised learning. Our proposed method is even highly competitive compared to the fully supervised counterpart with 100% labels.
arXiv Detail & Related papers (2022-10-14T19:13:36Z)
Modeling sequential annotations for sequence labeling with crowds [8.239028141030621]
Crowd sequential annotations can be an efficient and cost-effective way to build large datasets for sequence labeling. We propose Modeling sequential annotation for sequence labeling with crowds (SA-SLC) A valid label sequence inference (VLSE) method is proposed to derive the valid ground-truth label sequences from crowd sequential annotations.
arXiv Detail & Related papers (2022-09-20T02:51:23Z)
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection [23.39962989492527]
Transformer-based language models such as BERT have achieved the state-of-the-art on various NLP tasks, but are computationally prohibitive. We present Pyramid-BERT where we replace previously useds with a em core-set based token selection method justified by theoretical results. The core-set based token selection technique allows us to avoid expensive pre-training, gives a space-efficient fine tuning, and thus makes it suitable to handle longer sequence lengths.
arXiv Detail & Related papers (2022-03-27T19:52:01Z)
Uncertainty-Aware Label Refinement for Sequence Labeling [47.67853514765981]
We introduce a novel two-stage label decoding framework to model long-term label dependencies. A base model first predicts draft labels, and then a novel two-stream self-attention model makes refinements on these draft predictions.
arXiv Detail & Related papers (2020-12-19T06:56:59Z)
Semantic Label Smoothing for Sequence to Sequence Problems [54.758974840974425]
We propose a technique that smooths over emphwell formed relevant sequences that have sufficient n-gram overlap with the target sequence. Our method shows a consistent and significant improvement over the state-of-the-art techniques on different datasets.
arXiv Detail & Related papers (2020-10-15T00:31:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.