Training ELECTRA Augmented with Multi-word Selection
- URL: http://arxiv.org/abs/2106.00139v1
- Date: Mon, 31 May 2021 23:19:00 GMT
- Title: Training ELECTRA Augmented with Multi-word Selection
- Authors: Jiaming Shen, Jialu Liu, Tianqi Liu, Cong Yu, Jiawei Han
- Abstract summary: We present a new text encoder pre-training method that improves ELECTRA based on multi-task learning.
Specifically, we train the discriminator to simultaneously detect replaced tokens and select original tokens from candidate sets.
- Score: 53.77046731238381
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained text encoders such as BERT and its variants have recently
achieved state-of-the-art performances on many NLP tasks. While being
effective, these pre-training methods typically demand massive computation
resources. To accelerate pre-training, ELECTRA trains a discriminator that
predicts whether each input token is replaced by a generator. However, this new
task, as a binary classification, is less semantically informative. In this
study, we present a new text encoder pre-training method that improves ELECTRA
based on multi-task learning. Specifically, we train the discriminator to
simultaneously detect replaced tokens and select original tokens from candidate
sets. We further develop two techniques to effectively combine all pre-training
tasks: (1) using attention-based networks for task-specific heads, and (2)
sharing bottom layers of the generator and the discriminator. Extensive
experiments on GLUE and SQuAD datasets demonstrate both the effectiveness and
the efficiency of our proposed method.
Related papers
- Prompt Optimization via Adversarial In-Context Learning [51.18075178593142]
adv-ICL is implemented as a two-player game between a generator and a discriminator.
The generator tries to generate realistic enough output to fool the discriminator.
We show that adv-ICL results in significant improvements over state-of-the-art prompt optimization techniques.
arXiv Detail & Related papers (2023-12-05T09:44:45Z) - Pre-training Multi-task Contrastive Learning Models for Scientific
Literature Understanding [52.723297744257536]
Pre-trained language models (LMs) have shown effectiveness in scientific literature understanding tasks.
We propose a multi-task contrastive learning framework, SciMult, to facilitate common knowledge sharing across different literature understanding tasks.
arXiv Detail & Related papers (2023-05-23T16:47:22Z) - A Novel Plagiarism Detection Approach Combining BERT-based Word
Embedding, Attention-based LSTMs and an Improved Differential Evolution
Algorithm [11.142354615369273]
We propose a novel method for detecting plagiarism based on attention mechanism-based long short-term memory (LSTM) and bidirectional encoder representations from transformers (BERT) word embedding.
BERT could be included in a downstream task and fine-tuned as a task-specific structure, while the trained BERT model is capable of detecting various linguistic characteristics.
arXiv Detail & Related papers (2023-05-03T18:26:47Z) - Prompting ELECTRA: Few-Shot Learning with Discriminative Pre-Trained
Models [43.7024573212373]
We adapt prompt-based few-shot learning to ELECTRA and show that it outperforms masked language models in a wide range of tasks.
Our method can be easily adapted to tasks involving multi-token predictions without extra computation overhead.
arXiv Detail & Related papers (2022-05-30T16:32:30Z) - Knowledge Transfer by Discriminative Pre-training for Academic
Performance Prediction [5.3431413737671525]
We propose DPA, a transfer learning framework with Discriminative Pre-training tasks for Academic performance prediction.
Compared to the previous state-of-the-art generative pre-training method, DPA is more sample efficient, leading to fast convergence to lower academic performance prediction error.
arXiv Detail & Related papers (2021-06-28T13:02:23Z) - Learning to Sample Replacements for ELECTRA Pre-Training [40.17248997321726]
ELECTRA pretrains a discriminator to detect replaced tokens, where the replacements are sampled from a generator trained with masked language modeling.
Despite the compelling performance, ELECTRA suffers from the following two issues.
We propose two methods to improve replacement sampling for ELECTRA pre-training.
arXiv Detail & Related papers (2021-06-25T15:51:55Z) - Training Generative Adversarial Networks in One Stage [58.983325666852856]
We introduce a general training scheme that enables training GANs efficiently in only one stage.
We show that the proposed method is readily applicable to other adversarial-training scenarios, such as data-free knowledge distillation.
arXiv Detail & Related papers (2021-02-28T09:03:39Z) - MC-BERT: Efficient Language Pre-Training via a Meta Controller [96.68140474547602]
Large-scale pre-training is computationally expensive.
ELECTRA, an early attempt to accelerate pre-training, trains a discriminative model that predicts whether each input token was replaced by a generator.
We propose a novel meta-learning framework, MC-BERT, to achieve better efficiency and effectiveness.
arXiv Detail & Related papers (2020-06-10T09:22:19Z) - Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN [80.17705319689139]
We propose a data-free knowledge amalgamate strategy to craft a well-behaved multi-task student network from multiple single/multi-task teachers.
The proposed method without any training data achieves the surprisingly competitive results, even compared with some full-supervised methods.
arXiv Detail & Related papers (2020-03-20T03:20:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.