Prompting ELECTRA: Few-Shot Learning with Discriminative Pre-Trained
Models
- URL: http://arxiv.org/abs/2205.15223v1
- Date: Mon, 30 May 2022 16:32:30 GMT
- Title: Prompting ELECTRA: Few-Shot Learning with Discriminative Pre-Trained
Models
- Authors: Mengzhou Xia, Mikel Artetxe, Jingfei Du, Danqi Chen, Ves Stoyanov
- Abstract summary: We adapt prompt-based few-shot learning to ELECTRA and show that it outperforms masked language models in a wide range of tasks.
Our method can be easily adapted to tasks involving multi-token predictions without extra computation overhead.
- Score: 43.7024573212373
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained masked language models successfully perform few-shot learning by
formulating downstream tasks as text infilling. However, as a strong
alternative in full-shot settings, discriminative pre-trained models like
ELECTRA do not fit into the paradigm. In this work, we adapt prompt-based
few-shot learning to ELECTRA and show that it outperforms masked language
models in a wide range of tasks. ELECTRA is pre-trained to distinguish if a
token is generated or original. We naturally extend that to prompt-based
few-shot learning by training to score the originality of the target options
without introducing new parameters. Our method can be easily adapted to tasks
involving multi-token predictions without extra computation overhead. Analysis
shows that ELECTRA learns distributions that align better with downstream
tasks.
Related papers
- Semformer: Transformer Language Models with Semantic Planning [18.750863564495006]
Next-token prediction serves as the dominant component in current neural language models.
We introduce Semformer, a novel method of training a Transformer language model that explicitly models the semantic planning of response.
arXiv Detail & Related papers (2024-09-17T12:54:34Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - ELECTRA is a Zero-Shot Learner, Too [14.315501760755609]
"Pre-train, prompt, and predict" has achieved remarkable achievements compared with the "pre-train, fine-tune" paradigm.
In this paper, we propose a novel replaced token detection (RTD)-based prompt learning method.
Experimental results show that ELECTRA model based onRTD-prompt learning achieves surprisingly state-of-the-art zero-shot performance.
arXiv Detail & Related papers (2022-07-17T11:20:58Z) - Training ELECTRA Augmented with Multi-word Selection [53.77046731238381]
We present a new text encoder pre-training method that improves ELECTRA based on multi-task learning.
Specifically, we train the discriminator to simultaneously detect replaced tokens and select original tokens from candidate sets.
arXiv Detail & Related papers (2021-05-31T23:19:00Z) - Pre-Training Transformers as Energy-Based Cloze Models [95.04748595976811]
We introduce Electric, an energy-based cloze model for representation learning over text.
Electric does not use masking or output a full distribution over tokens that could occur in a context.
We train Electric using an algorithm based on noise-contrastive estimation and elucidate how this learning objective is closely related to the recently proposed ELECTRA pre-training method.
arXiv Detail & Related papers (2020-12-15T19:17:33Z) - MC-BERT: Efficient Language Pre-Training via a Meta Controller [96.68140474547602]
Large-scale pre-training is computationally expensive.
ELECTRA, an early attempt to accelerate pre-training, trains a discriminative model that predicts whether each input token was replaced by a generator.
We propose a novel meta-learning framework, MC-BERT, to achieve better efficiency and effectiveness.
arXiv Detail & Related papers (2020-06-10T09:22:19Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.