Related papers: Instance Regularization for Discriminative Language Model Pre-training

Instance Regularization for Discriminative Language Model Pre-training

URL: http://arxiv.org/abs/2210.05471v1
Date: Tue, 11 Oct 2022 14:16:37 GMT
Title: Instance Regularization for Discriminative Language Model Pre-training
Authors: Zhuosheng Zhang, Hai Zhao, Ming Zhou
Abstract summary: This work proposes to estimate the complexity of restoring the original sentences from corrupted ones in language model pre-training. Experimental results on natural language understanding and reading comprehension benchmarks show that our approach improves pre-training efficiency, effectiveness, and robustness.
Score: 108.41891836796366
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Discriminative pre-trained language models (PrLMs) can be generalized as denoising auto-encoders that work with two procedures, ennoising and denoising. First, an ennoising process corrupts texts with arbitrary noising functions to construct training instances. Then, a denoising language model is trained to restore the corrupted tokens. Existing studies have made progress by optimizing independent strategies of either ennoising or denosing. They treat training instances equally throughout the training process, with little attention on the individual contribution of those instances. To model explicit signals of instance contribution, this work proposes to estimate the complexity of restoring the original sentences from corrupted ones in language model pre-training. The estimations involve the corruption degree in the ennoising data construction process and the prediction confidence in the denoising counterpart. Experimental results on natural language understanding and reading comprehension benchmarks show that our approach improves pre-training efficiency, effectiveness, and robustness. Code is publicly available at https://github.com/cooelf/InstanceReg

Related papers

GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator [114.8954615026781]
We propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator. GanLM is trained with two pre-training objectives: replaced token detection and replaced token denoising. Experiments in language generation benchmarks show that GanLM with the powerful language understanding capability outperforms various strong pre-trained language models.
arXiv Detail & Related papers (2022-12-20T12:51:11Z)
Robustification of Multilingual Language Models to Real-world Noise with Robust Contrastive Pretraining [14.087882550564169]
We assess the robustness of neural models on noisy data and suggest improvements are limited to the English language. To benchmark the performance of pretrained multilingual models, we construct noisy datasets covering five languages and four NLP tasks. We propose Robust Contrastive Pretraining (RCP) to boost the zero-shot cross-lingual robustness of multilingual pretrained models.
arXiv Detail & Related papers (2022-10-10T15:40:43Z)
Improving Pre-trained Language Model Fine-tuning with Noise Stability Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR) Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model. We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z)
How Robust is Neural Machine Translation to Language Imbalance in Multilingual Tokenizer Training? [86.48323488619629]
We analyze how translation performance changes as the data ratios among languages vary in the tokenizer training corpus. We find that while relatively better performance is often observed when languages are more equally sampled, the downstream performance is more robust to language imbalance than we usually expected.
arXiv Detail & Related papers (2022-04-29T17:50:36Z)
COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining [59.169836983883656]
COCO-LM is a new self-supervised learning framework that pretrains Language Models by COrrecting challenging errors and COntrasting text sequences. COCO-LM employs an auxiliary language model to mask-and-predict tokens in original text sequences. Our analyses reveal that COCO-LM's advantages come from its challenging training signals, more contextualized token representations, and regularized sequence representations.
arXiv Detail & Related papers (2021-02-16T22:24:29Z)
CAPT: Contrastive Pre-Training for Learning Denoised Sequence Representations [42.86803751871867]
We present ContrAstive Pre-Training (CAPT) to learn noise invariant sequence representations. CAPT encourages the consistency between representations of the original sequence and its corrupted version via unsupervised instance-wise training signals.
arXiv Detail & Related papers (2020-10-13T13:08:34Z)
Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models via Continual Learning [74.25168207651376]
Fine-tuning pre-trained language models to downstream cross-lingual tasks has shown promising results. We leverage continual learning to preserve the cross-lingual ability of the pre-trained model when we fine-tune it to downstream tasks. Our methods achieve better performance than other fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.
arXiv Detail & Related papers (2020-04-29T14:07:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.