CAPT: Contrastive Pre-Training for Learning Denoised Sequence
Representations
- URL: http://arxiv.org/abs/2010.06351v4
- Date: Fri, 30 Oct 2020 03:47:06 GMT
- Title: CAPT: Contrastive Pre-Training for Learning Denoised Sequence
Representations
- Authors: Fuli Luo, Pengcheng Yang, Shicheng Li, Xuancheng Ren, Xu Sun
- Abstract summary: We present ContrAstive Pre-Training (CAPT) to learn noise invariant sequence representations.
CAPT encourages the consistency between representations of the original sequence and its corrupted version via unsupervised instance-wise training signals.
- Score: 42.86803751871867
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained self-supervised models such as BERT have achieved striking
success in learning sequence representations, especially for natural language
processing. These models typically corrupt the given sequences with certain
types of noise, such as masking, shuffling, or substitution, and then try to
recover the original input. However, such pre-training approaches are prone to
learning representations that are covariant with the noise, leading to the
discrepancy between the pre-training and fine-tuning stage. To remedy this, we
present ContrAstive Pre-Training (CAPT) to learn noise invariant sequence
representations. The proposed CAPT encourages the consistency between
representations of the original sequence and its corrupted version via
unsupervised instance-wise training signals. In this way, it not only
alleviates the pretrain-finetune discrepancy induced by the noise of
pre-training, but also aids the pre-trained model in better capturing global
semantics of the input via more effective sentence-level supervision. Different
from most prior work that focuses on a particular modality, comprehensive
empirical evidence on 11 natural language understanding and cross-modal tasks
illustrates that CAPT is applicable for both language and vision-language
tasks, and obtains surprisingly consistent improvement, including 0.6\%
absolute gain on GLUE benchmarks and 0.8\% absolute increment on
$\text{NLVR}^2$.
Related papers
- Improved Visual Fine-tuning with Natural Language Supervision [36.250244364023665]
Fine-tuning a visual pre-trained model can leverage the semantic information from large-scale pre-training data.
The problem of catastrophic forgetting in pre-trained backbone has been extensively studied for fine-tuning.
We introduce a reference distribution obtained from a fixed text classifier, which can help regularize the learned vision classifier.
arXiv Detail & Related papers (2023-04-04T03:08:02Z) - Instance Regularization for Discriminative Language Model Pre-training [108.41891836796366]
This work proposes to estimate the complexity of restoring the original sentences from corrupted ones in language model pre-training.
Experimental results on natural language understanding and reading comprehension benchmarks show that our approach improves pre-training efficiency, effectiveness, and robustness.
arXiv Detail & Related papers (2022-10-11T14:16:37Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text.
These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining.
We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z) - Bi-Granularity Contrastive Learning for Post-Training in Few-Shot Scene [10.822477939237459]
We propose contrastive masked language modeling (CMLM) for post-training to integrate both token-level and sequence-level contrastive learnings.
CMLM surpasses several recent post-training methods in few-shot settings without the need for data augmentation.
arXiv Detail & Related papers (2021-06-04T08:17:48Z) - Consistency Training with Virtual Adversarial Discrete Perturbation [17.311821099484987]
We propose an effective consistency training framework that enforces a training model's predictions given original and perturbed inputs to be similar.
This virtual adversarial discrete noise obtained by replacing a small portion of tokens efficiently pushes a training model's decision boundary.
arXiv Detail & Related papers (2021-04-15T07:49:43Z) - COCO-LM: Correcting and Contrasting Text Sequences for Language Model
Pretraining [59.169836983883656]
COCO-LM is a new self-supervised learning framework that pretrains Language Models by COrrecting challenging errors and COntrasting text sequences.
COCO-LM employs an auxiliary language model to mask-and-predict tokens in original text sequences.
Our analyses reveal that COCO-LM's advantages come from its challenging training signals, more contextualized token representations, and regularized sequence representations.
arXiv Detail & Related papers (2021-02-16T22:24:29Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z) - Syntactic Data Augmentation Increases Robustness to Inference Heuristics [27.513414694720716]
Pretrained neural models such as BERT show high accuracy on standard datasets, but a surprising lack of sensitivity to word order on controlled challenge sets.
We explore several methods to augment standard training sets with syntactically informative examples, generated by applying syntactic transformations to sentences from the MNLI corpus.
The best-performing augmentation method, subject/object inversion, improved BERT's accuracy on controlled examples that diagnose sensitivity to word order from 0.28 to 0.73, without affecting performance on the MNLI test set.
arXiv Detail & Related papers (2020-04-24T21:35:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.