NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original
Pre-training Task--Next Sentence Prediction
- URL: http://arxiv.org/abs/2109.03564v1
- Date: Wed, 8 Sep 2021 11:57:08 GMT
- Title: NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original
Pre-training Task--Next Sentence Prediction
- Authors: Yi Sun, Yu Zheng, Chao Hao, Hangping Qiu
- Abstract summary: Using prompts to perform various downstream tasks, also known as prompt-based learning or prompt-learning, has lately gained significant success in comparison to the pre-train and fine-tune paradigm.
In this paper, we attempt to accomplish several NLP tasks in a zero-shot scenario using a BERT original pre-training task abandoned by RoBERTa and other models--Next Sentence Prediction (NSP)
Unlike token-level techniques, our sentence-level prompt-based method NSP-BERT does not need to fix the length of the prompt or the position to be predicted, allowing it to handle tasks such as entity linking
- Score: 14.912579358678212
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Using prompts to utilize language models to perform various downstream tasks,
also known as prompt-based learning or prompt-learning, has lately gained
significant success in comparison to the pre-train and fine-tune paradigm.
Nonetheless, virtually all prompt-based methods are token-level, meaning they
all utilize GPT's left-to-right language model or BERT's masked language model
to perform cloze-style tasks. In this paper, we attempt to accomplish several
NLP tasks in the zero-shot scenario using a BERT original pre-training task
abandoned by RoBERTa and other models--Next Sentence Prediction (NSP). Unlike
token-level techniques, our sentence-level prompt-based method NSP-BERT does
not need to fix the length of the prompt or the position to be predicted,
allowing it to handle tasks such as entity linking with ease. Based on the
characteristics of NSP-BERT, we offer several quick building templates for
various downstream tasks. We suggest a two-stage prompt method for word sense
disambiguation tasks in particular. Our strategies for mapping the labels
significantly enhance the model's performance on sentence pair tasks. On the
FewCLUE benchmark, our NSP-BERT outperforms other zero-shot methods on most of
these tasks and comes close to the few-shot methods.
Related papers
- ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence
Labeling Tasks [12.700783525558721]
ToPro method decomposes an input sentence into single tokens and applies one prompt template to each token.
Our experiments on multilingual NER and POS tagging datasets demonstrate that ToPro-based fine-tuning outperforms Vanilla fine-tuning and Prompt-Tuning in zero-shot cross-lingual transfer.
arXiv Detail & Related papers (2024-01-29T21:44:27Z) - Automated Few-shot Classification with Instruction-Finetuned Language
Models [76.69064714392165]
We show that AuT-Few outperforms state-of-the-art few-shot learning methods.
We also show that AuT-Few is the best ranking method across datasets on the RAFT few-shot benchmark.
arXiv Detail & Related papers (2023-05-21T21:50:27Z) - Instance-wise Prompt Tuning for Pretrained Language Models [72.74916121511662]
Instance-wise Prompt Tuning (IPT) is the first prompt learning paradigm that injects knowledge from the input data instances to the prompts.
IPT significantly outperforms task-based prompt learning methods, and achieves comparable performance to conventional finetuning with only 0.5% - 1.5% of tuned parameters.
arXiv Detail & Related papers (2022-06-04T10:08:50Z) - Towards Unified Prompt Tuning for Few-shot Text Classification [47.71344780587704]
We present the Unified Prompt Tuning (UPT) framework, leading to better few-shot text classification for BERT-style models.
In UPT, a novel paradigm Prompt-Options-Verbalizer is proposed for joint prompt learning across different NLP tasks.
We also design a self-supervised task named Knowledge-enhanced Selective Masked Language Modeling to improve the PLM's generalization abilities.
arXiv Detail & Related papers (2022-05-11T07:40:45Z) - IDPG: An Instance-Dependent Prompt Generation Method [58.45110542003139]
Prompt tuning is a new, efficient NLP transfer learning paradigm that adds a task-specific prompt in each input instance during the model training stage.
We propose a conditional prompt generation method to generate prompts for each input instance.
arXiv Detail & Related papers (2022-04-09T15:45:27Z) - PERFECT: Prompt-free and Efficient Few-shot Learning with Language
Models [67.3725459417758]
PERFECT is a simple and efficient method for few-shot fine-tuning of PLMs without relying on any such handcrafting.
We show that manually engineered task prompts can be replaced with task-specific adapters that enable sample-efficient fine-tuning.
Experiments on a wide range of few-shot NLP tasks demonstrate that PERFECT, while being simple and efficient, also outperforms existing state-of-the-art few-shot learning methods.
arXiv Detail & Related papers (2022-04-03T22:31:25Z) - InstructionNER: A Multi-Task Instruction-Based Generative Framework for
Few-shot NER [31.32381919473188]
We propose a multi-task instruction-based generative framework, named InstructionNER, for low-resource named entity recognition.
Specifically, we reformulate the NER task as a generation problem, which enriches source sentences with task-specific instructions and answer options, then inferences the entities and types in natural language.
Experimental results show that our method consistently outperforms other baselines on five datasets in few-shot settings.
arXiv Detail & Related papers (2022-03-08T07:56:36Z) - AdaPrompt: Adaptive Model Training for Prompt-based NLP [77.12071707955889]
We propose AdaPrompt, adaptively retrieving external data for continual pretraining of PLMs.
Experimental results on five NLP benchmarks show that AdaPrompt can improve over standard PLMs in few-shot settings.
In zero-shot settings, our method outperforms standard prompt-based methods by up to 26.35% relative error reduction.
arXiv Detail & Related papers (2022-02-10T04:04:57Z) - Grad2Task: Improved Few-shot Text Classification Using Gradients for
Task Representation [24.488427641442694]
We propose a novel conditional neural process-based approach for few-shot text classification.
Our key idea is to represent each task using gradient information from a base model.
Our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches.
arXiv Detail & Related papers (2022-01-27T15:29:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.