Related papers: NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task--Next Sentence Prediction

NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task--Next Sentence Prediction

URL: http://arxiv.org/abs/2109.03564v1
Date: Wed, 8 Sep 2021 11:57:08 GMT
Title: NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task--Next Sentence Prediction
Authors: Yi Sun, Yu Zheng, Chao Hao, Hangping Qiu
Abstract summary: Using prompts to perform various downstream tasks, also known as prompt-based learning or prompt-learning, has lately gained significant success in comparison to the pre-train and fine-tune paradigm. In this paper, we attempt to accomplish several NLP tasks in a zero-shot scenario using a BERT original pre-training task abandoned by RoBERTa and other models--Next Sentence Prediction (NSP) Unlike token-level techniques, our sentence-level prompt-based method NSP-BERT does not need to fix the length of the prompt or the position to be predicted, allowing it to handle tasks such as entity linking
Score: 14.912579358678212
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Using prompts to utilize language models to perform various downstream tasks, also known as prompt-based learning or prompt-learning, has lately gained significant success in comparison to the pre-train and fine-tune paradigm. Nonetheless, virtually all prompt-based methods are token-level, meaning they all utilize GPT's left-to-right language model or BERT's masked language model to perform cloze-style tasks. In this paper, we attempt to accomplish several NLP tasks in the zero-shot scenario using a BERT original pre-training task abandoned by RoBERTa and other models--Next Sentence Prediction (NSP). Unlike token-level techniques, our sentence-level prompt-based method NSP-BERT does not need to fix the length of the prompt or the position to be predicted, allowing it to handle tasks such as entity linking with ease. Based on the characteristics of NSP-BERT, we offer several quick building templates for various downstream tasks. We suggest a two-stage prompt method for word sense disambiguation tasks in particular. Our strategies for mapping the labels significantly enhance the model's performance on sentence pair tasks. On the FewCLUE benchmark, our NSP-BERT outperforms other zero-shot methods on most of these tasks and comes close to the few-shot methods.

Related papers

ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks [12.700783525558721]
ToPro method decomposes an input sentence into single tokens and applies one prompt template to each token. Our experiments on multilingual NER and POS tagging datasets demonstrate that ToPro-based fine-tuning outperforms Vanilla fine-tuning and Prompt-Tuning in zero-shot cross-lingual transfer.
arXiv Detail & Related papers (2024-01-29T21:44:27Z)
Automated Few-shot Classification with Instruction-Finetuned Language Models [76.69064714392165]
We show that AuT-Few outperforms state-of-the-art few-shot learning methods. We also show that AuT-Few is the best ranking method across datasets on the RAFT few-shot benchmark.
arXiv Detail & Related papers (2023-05-21T21:50:27Z)
Instance-wise Prompt Tuning for Pretrained Language Models [72.74916121511662]
Instance-wise Prompt Tuning (IPT) is the first prompt learning paradigm that injects knowledge from the input data instances to the prompts. IPT significantly outperforms task-based prompt learning methods, and achieves comparable performance to conventional finetuning with only 0.5% - 1.5% of tuned parameters.
arXiv Detail & Related papers (2022-06-04T10:08:50Z)
Towards Unified Prompt Tuning for Few-shot Text Classification [47.71344780587704]
We present the Unified Prompt Tuning (UPT) framework, leading to better few-shot text classification for BERT-style models. In UPT, a novel paradigm Prompt-Options-Verbalizer is proposed for joint prompt learning across different NLP tasks. We also design a self-supervised task named Knowledge-enhanced Selective Masked Language Modeling to improve the PLM's generalization abilities.
arXiv Detail & Related papers (2022-05-11T07:40:45Z)
IDPG: An Instance-Dependent Prompt Generation Method [58.45110542003139]
Prompt tuning is a new, efficient NLP transfer learning paradigm that adds a task-specific prompt in each input instance during the model training stage. We propose a conditional prompt generation method to generate prompts for each input instance.
arXiv Detail & Related papers (2022-04-09T15:45:27Z)
PERFECT: Prompt-free and Efficient Few-shot Learning with Language Models [67.3725459417758]
PERFECT is a simple and efficient method for few-shot fine-tuning of PLMs without relying on any such handcrafting. We show that manually engineered task prompts can be replaced with task-specific adapters that enable sample-efficient fine-tuning. Experiments on a wide range of few-shot NLP tasks demonstrate that PERFECT, while being simple and efficient, also outperforms existing state-of-the-art few-shot learning methods.
arXiv Detail & Related papers (2022-04-03T22:31:25Z)
InstructionNER: A Multi-Task Instruction-Based Generative Framework for Few-shot NER [31.32381919473188]
We propose a multi-task instruction-based generative framework, named InstructionNER, for low-resource named entity recognition. Specifically, we reformulate the NER task as a generation problem, which enriches source sentences with task-specific instructions and answer options, then inferences the entities and types in natural language. Experimental results show that our method consistently outperforms other baselines on five datasets in few-shot settings.
arXiv Detail & Related papers (2022-03-08T07:56:36Z)
AdaPrompt: Adaptive Model Training for Prompt-based NLP [77.12071707955889]
We propose AdaPrompt, adaptively retrieving external data for continual pretraining of PLMs. Experimental results on five NLP benchmarks show that AdaPrompt can improve over standard PLMs in few-shot settings. In zero-shot settings, our method outperforms standard prompt-based methods by up to 26.35% relative error reduction.
arXiv Detail & Related papers (2022-02-10T04:04:57Z)
Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation [24.488427641442694]
We propose a novel conditional neural process-based approach for few-shot text classification. Our key idea is to represent each task using gradient information from a base model. Our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches.
arXiv Detail & Related papers (2022-01-27T15:29:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.