COCO-LM: Correcting and Contrasting Text Sequences for Language Model
Pretraining
- URL: http://arxiv.org/abs/2102.08473v1
- Date: Tue, 16 Feb 2021 22:24:29 GMT
- Title: COCO-LM: Correcting and Contrasting Text Sequences for Language Model
Pretraining
- Authors: Yu Meng, Chenyan Xiong, Payal Bajaj, Saurabh Tiwary, Paul Bennett,
Jiawei Han, Xia Song
- Abstract summary: COCO-LM is a new self-supervised learning framework that pretrains Language Models by COrrecting challenging errors and COntrasting text sequences.
COCO-LM employs an auxiliary language model to mask-and-predict tokens in original text sequences.
Our analyses reveal that COCO-LM's advantages come from its challenging training signals, more contextualized token representations, and regularized sequence representations.
- Score: 59.169836983883656
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present COCO-LM, a new self-supervised learning framework that pretrains
Language Models by COrrecting challenging errors and COntrasting text
sequences. COCO-LM employs an auxiliary language model to mask-and-predict
tokens in original text sequences. It creates more challenging pretraining
inputs, where noises are sampled based on their likelihood in the auxiliary
language model. COCO-LM then pretrains with two tasks: The first task,
corrective language modeling, learns to correct the auxiliary model's
corruptions by recovering the original tokens. The second task, sequence
contrastive learning, ensures that the language model generates sequence
representations that are invariant to noises and transformations. In our
experiments on the GLUE and SQuAD benchmarks, COCO-LM outperforms recent
pretraining approaches in various pretraining settings and few-shot
evaluations, with higher pretraining efficiency. Our analyses reveal that
COCO-LM's advantages come from its challenging training signals, more
contextualized token representations, and regularized sequence representations.
Related papers
- Instruction Position Matters in Sequence Generation with Large Language
Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization.
We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z) - Assessing Phrase Break of ESL Speech with Pre-trained Language Models
and Large Language Models [7.782346535009883]
This work introduces approaches to assessing phrase breaks in ESL learners' speech using pre-trained language models (PLMs) and large language models (LLMs)
arXiv Detail & Related papers (2023-06-08T07:10:39Z) - Scalable Learning of Latent Language Structure With Logical Offline
Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text.
As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z) - GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator [114.8954615026781]
We propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator.
GanLM is trained with two pre-training objectives: replaced token detection and replaced token denoising.
Experiments in language generation benchmarks show that GanLM with the powerful language understanding capability outperforms various strong pre-trained language models.
arXiv Detail & Related papers (2022-12-20T12:51:11Z) - Instance Regularization for Discriminative Language Model Pre-training [108.41891836796366]
This work proposes to estimate the complexity of restoring the original sentences from corrupted ones in language model pre-training.
Experimental results on natural language understanding and reading comprehension benchmarks show that our approach improves pre-training efficiency, effectiveness, and robustness.
arXiv Detail & Related papers (2022-10-11T14:16:37Z) - Bi-Granularity Contrastive Learning for Post-Training in Few-Shot Scene [10.822477939237459]
We propose contrastive masked language modeling (CMLM) for post-training to integrate both token-level and sequence-level contrastive learnings.
CMLM surpasses several recent post-training methods in few-shot settings without the need for data augmentation.
arXiv Detail & Related papers (2021-06-04T08:17:48Z) - SLM: Learning a Discourse Language Representation with Sentence
Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z) - CAPT: Contrastive Pre-Training for Learning Denoised Sequence
Representations [42.86803751871867]
We present ContrAstive Pre-Training (CAPT) to learn noise invariant sequence representations.
CAPT encourages the consistency between representations of the original sequence and its corrupted version via unsupervised instance-wise training signals.
arXiv Detail & Related papers (2020-10-13T13:08:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.