Related papers: Pre-trained Token-replaced Detection Model as Few-shot Learner

Pre-trained Token-replaced Detection Model as Few-shot Learner

URL: http://arxiv.org/abs/2203.03235v2
Date: Tue, 21 Mar 2023 07:43:30 GMT
Title: Pre-trained Token-replaced Detection Model as Few-shot Learner
Authors: Zicheng Li, Shoushan Li, Guodong Zhou
Abstract summary: We propose a novel approach to few-shot learning with pre-trained token-replaced detection models like ELECTRA. A systematic evaluation on 16 datasets demonstrates that our approach outperforms few-shot learners with pre-trained masked language models.
Score: 31.40447168356879
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pre-trained masked language models have demonstrated remarkable ability as few-shot learners. In this paper, as an alternative, we propose a novel approach to few-shot learning with pre-trained token-replaced detection models like ELECTRA. In this approach, we reformulate a classification or a regression task as a token-replaced detection problem. Specifically, we first define a template and label description words for each task and put them into the input to form a natural language prompt. Then, we employ the pre-trained token-replaced detection model to predict which label description word is the most original (i.e., least replaced) among all label description words in the prompt. A systematic evaluation on 16 datasets demonstrates that our approach outperforms few-shot learners with pre-trained masked language models in both one-sentence and two-sentence learning tasks.

Related papers

Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models [4.165536532090932]
The disconnect between tokenizer creation and model training in language models allows for specific inputs, such as the infamous SolidGoldMagikarp token, to induce unwanted model behaviour. We present a comprehensive analysis of Large Language Model tokenizers, specifically targeting this issue of detecting under-trained tokens. Through a combination of tokenizer analysis, model weight-based indicators, and prompting techniques, we develop novel and effective methods for automatically detecting these problematic tokens.
arXiv Detail & Related papers (2024-05-08T20:37:56Z)
Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation [82.5217996570387]
We adapt a pre-trained language model for auto-regressive text-to-image generation. We find that pre-trained language models offer limited help.
arXiv Detail & Related papers (2023-11-27T07:19:26Z)
Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models. We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks. Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z)
Text2Model: Text-based Model Induction for Zero-shot Image Classification [38.704831945753284]
We address the challenge of building task-agnostic classifiers using only text descriptions. We generate zero-shot classifiers using a hypernetwork that receives class descriptions and outputs a multi-class model. We evaluate this approach in a series of zero-shot classification tasks, for image, point-cloud, and action recognition, using a range of text descriptions.
arXiv Detail & Related papers (2022-10-27T05:19:55Z)
Prompting ELECTRA: Few-Shot Learning with Discriminative Pre-Trained Models [43.7024573212373]
We adapt prompt-based few-shot learning to ELECTRA and show that it outperforms masked language models in a wide range of tasks. Our method can be easily adapted to tasks involving multi-token predictions without extra computation overhead.
arXiv Detail & Related papers (2022-05-30T16:32:30Z)
Eliciting Knowledge from Pretrained Language Models for Prototypical Prompt Verbalizer [12.596033546002321]
In this paper, we focus on eliciting knowledge from pretrained language models and propose a prototypical prompt verbalizer for prompt-tuning. For zero-shot settings, knowledge is elicited from pretrained language models by a manually designed template to form initial prototypical embeddings. For few-shot settings, models are tuned to learn meaningful and interpretable prototypical embeddings.
arXiv Detail & Related papers (2022-01-14T12:04:37Z)
Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model. In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z)
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing [78.8500633981247]
This paper surveys and organizes research works in a new paradigm in natural language processing, which we dub "prompt-based learning" Unlike traditional supervised learning, which trains a model to take in an input x and predict an output y as P(y|x), prompt-based learning is based on language models that model the probability of text directly.
arXiv Detail & Related papers (2021-07-28T18:09:46Z)
COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining [59.169836983883656]
COCO-LM is a new self-supervised learning framework that pretrains Language Models by COrrecting challenging errors and COntrasting text sequences. COCO-LM employs an auxiliary language model to mask-and-predict tokens in original text sequences. Our analyses reveal that COCO-LM's advantages come from its challenging training signals, more contextualized token representations, and regularized sequence representations.
arXiv Detail & Related papers (2021-02-16T22:24:29Z)
$k$-Neighbor Based Curriculum Sampling for Sequence Prediction [22.631763991832862]
Multi-step ahead prediction in language models is challenging due to discrepancy between training and test time processes. We propose textitNearest-Neighbor Replacement Sampling -- a curriculum learning-based method that gradually changes an initially deterministic teacher policy. We report our findings on two language modelling benchmarks and find that the proposed method further improves performance when used in conjunction with scheduled sampling.
arXiv Detail & Related papers (2021-01-22T20:07:29Z)
Adversarial Transfer Learning for Punctuation Restoration [58.2201356693101]
Adversarial multi-task learning is introduced to learn task invariant knowledge for punctuation prediction. Experiments are conducted on IWSLT2011 datasets.
arXiv Detail & Related papers (2020-04-01T06:19:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.