Protum: A New Method For Prompt Tuning Based on "[MASK]"
- URL: http://arxiv.org/abs/2201.12109v1
- Date: Fri, 28 Jan 2022 13:34:30 GMT
- Title: Protum: A New Method For Prompt Tuning Based on "[MASK]"
- Authors: Pan He and Yuxi Chen and Yan Wang and Yanru Zhang
- Abstract summary: We propose a new textbfPrompt textbfTuning based on "[textbfMASK]" (textbfProtum) method in this paper.
Our textbfProtum can achieve much better performance than fine-tuning after continuous pre-training with less time consumption.
- Score: 12.057434751507552
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, prompt tuning \cite{lester2021power} has gradually become a new
paradigm for NLP, which only depends on the representation of the words by
freezing the parameters of pre-trained language models (PLMs) to obtain
remarkable performance on downstream tasks. It maintains the consistency of
Masked Language Model (MLM) \cite{devlin2018bert} task in the process of
pre-training, and avoids some issues that may happened during fine-tuning.
Naturally, we consider that the "[MASK]" tokens carry more useful information
than other tokens because the model combines with context to predict the masked
tokens. Among the current prompt tuning methods, there will be a serious
problem of random composition of the answer tokens in prediction when they
predict multiple words so that they have to map tokens to labels with the help
verbalizer. In response to the above issue, we propose a new \textbf{Pro}mpt
\textbf{Tu}ning based on "[\textbf{M}ASK]" (\textbf{Protum}) method in this
paper, which constructs a classification task through the information carried
by the hidden layer of "[MASK]" tokens and then predicts the labels directly
rather than the answer tokens. At the same time, we explore how different
hidden layers under "[MASK]" impact on our classification model on many
different data sets. Finally, we find that our \textbf{Protum} can achieve much
better performance than fine-tuning after continuous pre-training with less
time consumption. Our model facilitates the practical application of large
models in NLP.
Related papers
- Semformer: Transformer Language Models with Semantic Planning [18.750863564495006]
Next-token prediction serves as the dominant component in current neural language models.
We introduce Semformer, a novel method of training a Transformer language model that explicitly models the semantic planning of response.
arXiv Detail & Related papers (2024-09-17T12:54:34Z) - Empowering Character-level Text Infilling by Eliminating Sub-Tokens [34.37743927032878]
FIM-SE stands for Fill-In-the-Middle with both Starting and Ending character constraints.
We introduce FIM-SE, which stands for Fill-In-the-Middle with both Starting and Ending character constraints.
arXiv Detail & Related papers (2024-05-27T12:21:48Z) - TokenUnify: Scalable Autoregressive Visual Pre-training with Mixture Token Prediction [61.295716741720284]
TokenUnify is a novel pretraining method that integrates random token prediction, next-token prediction, and next-all token prediction.
Cooperated with TokenUnify, we have assembled a large-scale electron microscopy (EM) image dataset with ultra-high resolution.
This dataset includes over 120 million annotated voxels, making it the largest neuron segmentation dataset to date.
arXiv Detail & Related papers (2024-05-27T05:45:51Z) - Object Recognition as Next Token Prediction [99.40793702627396]
We present an approach to pose object recognition as next token prediction.
The idea is to apply a language decoder that auto-regressively predicts the text tokens from image embeddings to form labels.
arXiv Detail & Related papers (2023-12-04T18:58:40Z) - LabelPrompt: Effective Prompt-based Learning for Relation Classification [31.291466190218912]
This paper presents a novel prompt-based learning method, namely LabelPrompt, for the relation classification task.
Motivated by the intuition to GIVE MODEL CHOICES!'', we first define additional tokens to represent relation labels, which regard these tokens as the verbaliser with semantic initialisation.
Then, to mitigate inconsistency between predicted relations and given entities, we implement an entity-aware module with contrastive learning.
arXiv Detail & Related papers (2023-02-16T04:06:25Z) - Nonparametric Masked Language Modeling [113.71921977520864]
Existing language models (LMs) predict tokens with a softmax over a finite vocabulary.
We introduce NPM, the first nonparametric masked language model that replaces this softmax with a nonparametric distribution over every phrase in a reference corpus.
NPM can be efficiently trained with a contrastive objective and an in-batch approximation to full corpus retrieval.
arXiv Detail & Related papers (2022-12-02T18:10:42Z) - RetroMAE v2: Duplex Masked Auto-Encoder For Pre-Training
Retrieval-Oriented Language Models [3.4523793651427113]
We propose duplex masked auto-encoder, a.k.a. DupMAE, which targets on improving the semantic representation capacity for contextualized embeddings of both [] and ordinary tokens.
DupMAE is simple but empirically competitive: with a small decoding cost, it substantially contributes to the model's representation capability and transferability.
arXiv Detail & Related papers (2022-11-16T08:57:55Z) - FCM: Forgetful Causal Masking Makes Causal Language Models Better
Zero-Shot Learners [139.6321017962092]
We propose a simple technique that significantly boosts the performance of large language models without adding computational cost.
Our key observation is that, by performing the next token prediction task with randomly selected past tokens masked out, we can improve the quality of the learned representations.
Experimental results show that our method also improves PaLM's zero and few-shot performance on a diverse suite of tasks.
arXiv Detail & Related papers (2022-10-24T17:46:57Z) - Token Dropping for Efficient BERT Pretraining [33.63507016806947]
We develop a simple but effective "token dropping" method to accelerate the pretraining of transformer models.
We leverage the already built-in masked language modeling (MLM) loss to identify unimportant tokens with practically no computational overhead.
This simple approach reduces the pretraining cost of BERT by 25% while achieving similar overall fine-tuning performance on standard downstream tasks.
arXiv Detail & Related papers (2022-03-24T17:50:46Z) - COCO-LM: Correcting and Contrasting Text Sequences for Language Model
Pretraining [59.169836983883656]
COCO-LM is a new self-supervised learning framework that pretrains Language Models by COrrecting challenging errors and COntrasting text sequences.
COCO-LM employs an auxiliary language model to mask-and-predict tokens in original text sequences.
Our analyses reveal that COCO-LM's advantages come from its challenging training signals, more contextualized token representations, and regularized sequence representations.
arXiv Detail & Related papers (2021-02-16T22:24:29Z) - ELECTRA: Pre-training Text Encoders as Discriminators Rather Than
Generators [108.3381301768299]
Masked language modeling (MLM) pre-training methods such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens.
We propose a more sample-efficient pre-training task called replaced token detection.
arXiv Detail & Related papers (2020-03-23T21:17:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.