Related papers: Span Fine-tuning for Pre-trained Language Models

Span Fine-tuning for Pre-trained Language Models

URL: http://arxiv.org/abs/2108.12848v1
Date: Sun, 29 Aug 2021 14:11:38 GMT
Title: Span Fine-tuning for Pre-trained Language Models
Authors: Rongzhou Bao, Zhuosheng Zhang, Hai Zhao
Abstract summary: This paper presents a novel span fine-tuning method for PrLMs. Any sentences processed by the PrLM will be segmented into multiple spans according to a pre-sampled dictionary. Experiments on GLUE benchmark show that the proposed span fine-tuning method significantly enhances the PrLM.
Score: 43.352833140317486
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pre-trained language models (PrLM) have to carefully manage input units when training on a very large text with a vocabulary consisting of millions of words. Previous works have shown that incorporating span-level information over consecutive words in pre-training could further improve the performance of PrLMs. However, given that span-level clues are introduced and fixed in pre-training, previous methods are time-consuming and lack of flexibility. To alleviate the inconvenience, this paper presents a novel span fine-tuning method for PrLMs, which facilitates the span setting to be adaptively determined by specific downstream tasks during the fine-tuning phase. In detail, any sentences processed by the PrLM will be segmented into multiple spans according to a pre-sampled dictionary. Then the segmentation information will be sent through a hierarchical CNN module together with the representation outputs of the PrLM and ultimately generate a span-enhanced representation. Experiments on GLUE benchmark show that the proposed span fine-tuning method significantly enhances the PrLM, and at the same time, offer more flexibility in an efficient way.

Related papers

Rethinking Time Series Forecasting with LLMs via Nearest Neighbor Contrastive Learning [1.7892194562398749]
We propose NNCL-TLLM: Nearest Neighbor Contrastive Learning for Time series forecasting via Large Language Models. First, we generate time series compatible text prototypes such that each text prototype represents both word token embeddings in its neighborhood and time series characteristics. We then fine-tune the layer normalization and positional embeddings of the LLM, keeping the other layers intact, reducing the trainable parameters and decreasing the computational cost.
arXiv Detail & Related papers (2024-12-06T06:32:47Z)
SentenceVAE: Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context [49.9628075245959]
We present Sentence Variational Autoencoder (SentenceVAE), which includes a Sentence to compress multiple tokens in a sentence into a single token, and a Sentence Decoder to reconstruct it. The proposed method can accelerate inference speed by 204365%, reduce perplexity (PPL) to 4675% of its original metric, and decrease memory overhead by 8691% for the equivalent context length.
arXiv Detail & Related papers (2024-08-01T15:45:19Z)
Bucket Pre-training is All You Need [9.332544709626875]
Large language models (LLMs) have demonstrated exceptional performance across various natural language processing tasks. The conventional fixed-length data composition strategy for pretraining, which involves concatenating and splitting documents, can introduce noise and limit the model's ability to capture long-range dependencies. We propose a multi-bucket data composition method that moves beyond the fixed-length paradigm, offering a more flexible and efficient approach to pretraining.
arXiv Detail & Related papers (2024-07-10T09:27:23Z)
Instruction Position Matters in Sequence Generation with Large Language Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization. We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z)
Assessing Phrase Break of ESL Speech with Pre-trained Language Models and Large Language Models [7.782346535009883]
This work introduces approaches to assessing phrase breaks in ESL learners' speech using pre-trained language models (PLMs) and large language models (LLMs)
arXiv Detail & Related papers (2023-06-08T07:10:39Z)
Prompt Tuning for Discriminative Pre-trained Language Models [96.04765512463415]
Recent works have shown promising results of prompt tuning in stimulating pre-trained language models (PLMs) for natural language processing (NLP) tasks. It is still unknown whether and how discriminative PLMs, e.g., ELECTRA, can be effectively prompt-tuned. We present DPT, the first prompt tuning framework for discriminative PLMs, which reformulates NLP tasks into a discriminative language modeling problem.
arXiv Detail & Related papers (2022-05-23T10:11:50Z)
COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining [59.169836983883656]
COCO-LM is a new self-supervised learning framework that pretrains Language Models by COrrecting challenging errors and COntrasting text sequences. COCO-LM employs an auxiliary language model to mask-and-predict tokens in original text sequences. Our analyses reveal that COCO-LM's advantages come from its challenging training signals, more contextualized token representations, and regularized sequence representations.
arXiv Detail & Related papers (2021-02-16T22:24:29Z)
CSS-LM: A Contrastive Framework for Semi-supervised Fine-tuning of Pre-trained Language Models [59.49705076369856]
We introduce a novel framework to improve the fine-tuning phase of pre-trained language models (PLMs) We retrieve positive and negative instances from large-scale unlabeled corpora according to their domain-level and class-level semantic relatedness to a task. We then perform contrastive semi-supervised learning on both the retrieved unlabeled and original labeled instances to help PLMs capture crucial task-related semantic features.
arXiv Detail & Related papers (2021-02-07T09:27:26Z)
Enhancing Pre-trained Language Model with Lexical Simplification [41.34550924004487]
lexical simplification (LS) is a recognized method to reduce such lexical diversity. We propose a novel approach which can effectively improve the performance of PrLMs in text classification.
arXiv Detail & Related papers (2020-12-30T07:49:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.