Span Fine-tuning for Pre-trained Language Models
- URL: http://arxiv.org/abs/2108.12848v1
- Date: Sun, 29 Aug 2021 14:11:38 GMT
- Title: Span Fine-tuning for Pre-trained Language Models
- Authors: Rongzhou Bao, Zhuosheng Zhang, Hai Zhao
- Abstract summary: This paper presents a novel span fine-tuning method for PrLMs.
Any sentences processed by the PrLM will be segmented into multiple spans according to a pre-sampled dictionary.
Experiments on GLUE benchmark show that the proposed span fine-tuning method significantly enhances the PrLM.
- Score: 43.352833140317486
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pre-trained language models (PrLM) have to carefully manage input units when
training on a very large text with a vocabulary consisting of millions of
words. Previous works have shown that incorporating span-level information over
consecutive words in pre-training could further improve the performance of
PrLMs. However, given that span-level clues are introduced and fixed in
pre-training, previous methods are time-consuming and lack of flexibility. To
alleviate the inconvenience, this paper presents a novel span fine-tuning
method for PrLMs, which facilitates the span setting to be adaptively
determined by specific downstream tasks during the fine-tuning phase. In
detail, any sentences processed by the PrLM will be segmented into multiple
spans according to a pre-sampled dictionary. Then the segmentation information
will be sent through a hierarchical CNN module together with the representation
outputs of the PrLM and ultimately generate a span-enhanced representation.
Experiments on GLUE benchmark show that the proposed span fine-tuning method
significantly enhances the PrLM, and at the same time, offer more flexibility
in an efficient way.
Related papers
- SentenceVAE: Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context [49.9628075245959]
We present Sentence Variational Autoencoder (SentenceVAE), which includes a Sentence to compress multiple tokens in a sentence into a single token, and a Sentence Decoder to reconstruct it.
The proposed method can accelerate inference speed by 204365%, reduce perplexity (PPL) to 4675% of its original metric, and decrease memory overhead by 8691% for the equivalent context length.
arXiv Detail & Related papers (2024-08-01T15:45:19Z) - Bucket Pre-training is All You Need [9.332544709626875]
Large language models (LLMs) have demonstrated exceptional performance across various natural language processing tasks.
The conventional fixed-length data composition strategy for pretraining, which involves concatenating and splitting documents, can introduce noise and limit the model's ability to capture long-range dependencies.
We propose a multi-bucket data composition method that moves beyond the fixed-length paradigm, offering a more flexible and efficient approach to pretraining.
arXiv Detail & Related papers (2024-07-10T09:27:23Z) - Instruction Position Matters in Sequence Generation with Large Language
Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization.
We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z) - Assessing Phrase Break of ESL Speech with Pre-trained Language Models
and Large Language Models [7.782346535009883]
This work introduces approaches to assessing phrase breaks in ESL learners' speech using pre-trained language models (PLMs) and large language models (LLMs)
arXiv Detail & Related papers (2023-06-08T07:10:39Z) - Prompt Tuning for Discriminative Pre-trained Language Models [96.04765512463415]
Recent works have shown promising results of prompt tuning in stimulating pre-trained language models (PLMs) for natural language processing (NLP) tasks.
It is still unknown whether and how discriminative PLMs, e.g., ELECTRA, can be effectively prompt-tuned.
We present DPT, the first prompt tuning framework for discriminative PLMs, which reformulates NLP tasks into a discriminative language modeling problem.
arXiv Detail & Related papers (2022-05-23T10:11:50Z) - COCO-LM: Correcting and Contrasting Text Sequences for Language Model
Pretraining [59.169836983883656]
COCO-LM is a new self-supervised learning framework that pretrains Language Models by COrrecting challenging errors and COntrasting text sequences.
COCO-LM employs an auxiliary language model to mask-and-predict tokens in original text sequences.
Our analyses reveal that COCO-LM's advantages come from its challenging training signals, more contextualized token representations, and regularized sequence representations.
arXiv Detail & Related papers (2021-02-16T22:24:29Z) - CSS-LM: A Contrastive Framework for Semi-supervised Fine-tuning of
Pre-trained Language Models [59.49705076369856]
We introduce a novel framework to improve the fine-tuning phase of pre-trained language models (PLMs)
We retrieve positive and negative instances from large-scale unlabeled corpora according to their domain-level and class-level semantic relatedness to a task.
We then perform contrastive semi-supervised learning on both the retrieved unlabeled and original labeled instances to help PLMs capture crucial task-related semantic features.
arXiv Detail & Related papers (2021-02-07T09:27:26Z) - Enhancing Pre-trained Language Model with Lexical Simplification [41.34550924004487]
lexical simplification (LS) is a recognized method to reduce such lexical diversity.
We propose a novel approach which can effectively improve the performance of PrLMs in text classification.
arXiv Detail & Related papers (2020-12-30T07:49:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.