GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator
- URL: http://arxiv.org/abs/2212.10218v2
- Date: Tue, 9 May 2023 09:08:33 GMT
- Title: GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator
- Authors: Jian Yang, Shuming Ma, Li Dong, Shaohan Huang, Haoyang Huang, Yuwei
Yin, Dongdong Zhang, Liqun Yang, Furu Wei and Zhoujun Li
- Abstract summary: We propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator.
GanLM is trained with two pre-training objectives: replaced token detection and replaced token denoising.
Experiments in language generation benchmarks show that GanLM with the powerful language understanding capability outperforms various strong pre-trained language models.
- Score: 114.8954615026781
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained models have achieved remarkable success in natural language
processing (NLP). However, existing pre-training methods underutilize the
benefits of language understanding for generation. Inspired by the idea of
Generative Adversarial Networks (GANs), we propose a GAN-style model for
encoder-decoder pre-training by introducing an auxiliary discriminator,
unifying the ability of language understanding and generation in a single
model. Our model, named as GanLM, is trained with two pre-training objectives:
replaced token detection and replaced token denoising. Specifically, given
masked source sentences, the generator outputs the target distribution and the
discriminator predicts whether the target sampled tokens from distribution are
incorrect. The target sentence is replaced with misclassified tokens to
construct noisy previous context, which is used to generate the gold sentence.
In general, both tasks improve the ability of language understanding and
generation by selectively using the denoising data. Extensive experiments in
language generation benchmarks show that GanLM with the powerful language
understanding capability outperforms various strong pre-trained language models
(PLMs) and achieves state-of-the-art performance.
Related papers
- Generate to Understand for Representation [3.5325087487696463]
GUR is a pretraining framework that combines language modeling and contrastive learning objectives in a single training step.
GUR achieves impressive results without any labeled training data, outperforming all other pretrained baselines as a retriever at the recall benchmark in a zero-shot setting.
arXiv Detail & Related papers (2023-06-14T06:00:18Z) - Extrapolating Multilingual Understanding Models as Multilingual
Generators [82.1355802012414]
This paper explores methods to empower multilingual understanding models the generation abilities to get a unified model.
We propose a textbfSemantic-textbfGuided textbfAlignment-then-Denoising (SGA) approach to adapt an encoder to a multilingual generator with a small number of new parameters.
arXiv Detail & Related papers (2023-05-22T15:33:21Z) - Summarize and Generate to Back-translate: Unsupervised Translation of
Programming Languages [86.08359401867577]
Back-translation is widely known for its effectiveness for neural machine translation when little to no parallel data is available.
We propose performing back-translation via code summarization and generation.
We show that our proposed approach performs competitively with state-of-the-art methods.
arXiv Detail & Related papers (2022-05-23T08:20:41Z) - COCO-LM: Correcting and Contrasting Text Sequences for Language Model
Pretraining [59.169836983883656]
COCO-LM is a new self-supervised learning framework that pretrains Language Models by COrrecting challenging errors and COntrasting text sequences.
COCO-LM employs an auxiliary language model to mask-and-predict tokens in original text sequences.
Our analyses reveal that COCO-LM's advantages come from its challenging training signals, more contextualized token representations, and regularized sequence representations.
arXiv Detail & Related papers (2021-02-16T22:24:29Z) - SLM: Learning a Discourse Language Representation with Sentence
Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z) - Discriminatively-Tuned Generative Classifiers for Robust Natural
Language Inference [59.62779187457773]
We propose a generative classifier for natural language inference (NLI)
We compare it to five baselines, including discriminative models and large-scale pretrained language representation models like BERT.
Experiments show that GenNLI outperforms both discriminative and pretrained baselines across several challenging NLI experimental settings.
arXiv Detail & Related papers (2020-10-08T04:44:00Z) - Learning Spoken Language Representations with Neural Lattice Language
Modeling [39.50831917042577]
We propose a framework that trains neural lattice language models to provide contextualized representations for spoken language understanding tasks.
The proposed two-stage pre-training approach reduces the demands of speech data and has better efficiency.
arXiv Detail & Related papers (2020-07-06T10:38:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.