Related papers: Training ELECTRA Augmented with Multi-word Selection

Training ELECTRA Augmented with Multi-word Selection

URL: http://arxiv.org/abs/2106.00139v1
Date: Mon, 31 May 2021 23:19:00 GMT
Title: Training ELECTRA Augmented with Multi-word Selection
Authors: Jiaming Shen, Jialu Liu, Tianqi Liu, Cong Yu, Jiawei Han
Abstract summary: We present a new text encoder pre-training method that improves ELECTRA based on multi-task learning. Specifically, we train the discriminator to simultaneously detect replaced tokens and select original tokens from candidate sets.
Score: 53.77046731238381
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pre-trained text encoders such as BERT and its variants have recently achieved state-of-the-art performances on many NLP tasks. While being effective, these pre-training methods typically demand massive computation resources. To accelerate pre-training, ELECTRA trains a discriminator that predicts whether each input token is replaced by a generator. However, this new task, as a binary classification, is less semantically informative. In this study, we present a new text encoder pre-training method that improves ELECTRA based on multi-task learning. Specifically, we train the discriminator to simultaneously detect replaced tokens and select original tokens from candidate sets. We further develop two techniques to effectively combine all pre-training tasks: (1) using attention-based networks for task-specific heads, and (2) sharing bottom layers of the generator and the discriminator. Extensive experiments on GLUE and SQuAD datasets demonstrate both the effectiveness and the efficiency of our proposed method.

Related papers

Prompt Optimization via Adversarial In-Context Learning [51.18075178593142]
adv-ICL is implemented as a two-player game between a generator and a discriminator. The generator tries to generate realistic enough output to fool the discriminator. We show that adv-ICL results in significant improvements over state-of-the-art prompt optimization techniques.
arXiv Detail & Related papers (2023-12-05T09:44:45Z)
Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding [52.723297744257536]
Pre-trained language models (LMs) have shown effectiveness in scientific literature understanding tasks. We propose a multi-task contrastive learning framework, SciMult, to facilitate common knowledge sharing across different literature understanding tasks.
arXiv Detail & Related papers (2023-05-23T16:47:22Z)
A Novel Plagiarism Detection Approach Combining BERT-based Word Embedding, Attention-based LSTMs and an Improved Differential Evolution Algorithm [11.142354615369273]
We propose a novel method for detecting plagiarism based on attention mechanism-based long short-term memory (LSTM) and bidirectional encoder representations from transformers (BERT) word embedding. BERT could be included in a downstream task and fine-tuned as a task-specific structure, while the trained BERT model is capable of detecting various linguistic characteristics.
arXiv Detail & Related papers (2023-05-03T18:26:47Z)
Prompting ELECTRA: Few-Shot Learning with Discriminative Pre-Trained Models [43.7024573212373]
We adapt prompt-based few-shot learning to ELECTRA and show that it outperforms masked language models in a wide range of tasks. Our method can be easily adapted to tasks involving multi-token predictions without extra computation overhead.
arXiv Detail & Related papers (2022-05-30T16:32:30Z)
Knowledge Transfer by Discriminative Pre-training for Academic Performance Prediction [5.3431413737671525]
We propose DPA, a transfer learning framework with Discriminative Pre-training tasks for Academic performance prediction. Compared to the previous state-of-the-art generative pre-training method, DPA is more sample efficient, leading to fast convergence to lower academic performance prediction error.
arXiv Detail & Related papers (2021-06-28T13:02:23Z)
Learning to Sample Replacements for ELECTRA Pre-Training [40.17248997321726]
ELECTRA pretrains a discriminator to detect replaced tokens, where the replacements are sampled from a generator trained with masked language modeling. Despite the compelling performance, ELECTRA suffers from the following two issues. We propose two methods to improve replacement sampling for ELECTRA pre-training.
arXiv Detail & Related papers (2021-06-25T15:51:55Z)
Training Generative Adversarial Networks in One Stage [58.983325666852856]
We introduce a general training scheme that enables training GANs efficiently in only one stage. We show that the proposed method is readily applicable to other adversarial-training scenarios, such as data-free knowledge distillation.
arXiv Detail & Related papers (2021-02-28T09:03:39Z)
MC-BERT: Efficient Language Pre-Training via a Meta Controller [96.68140474547602]
Large-scale pre-training is computationally expensive. ELECTRA, an early attempt to accelerate pre-training, trains a discriminative model that predicts whether each input token was replaced by a generator. We propose a novel meta-learning framework, MC-BERT, to achieve better efficiency and effectiveness.
arXiv Detail & Related papers (2020-06-10T09:22:19Z)
Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN [80.17705319689139]
We propose a data-free knowledge amalgamate strategy to craft a well-behaved multi-task student network from multiple single/multi-task teachers. The proposed method without any training data achieves the surprisingly competitive results, even compared with some full-supervised methods.
arXiv Detail & Related papers (2020-03-20T03:20:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.