Adversarial Training with Contrastive Learning in NLP
- URL: http://arxiv.org/abs/2109.09075v1
- Date: Sun, 19 Sep 2021 07:23:45 GMT
- Title: Adversarial Training with Contrastive Learning in NLP
- Authors: Daniela N. Rim, DongNyeong Heo, Heeyoul Choi
- Abstract summary: We propose adversarial training with contrastive learning (ATCL) to adversarially train a language processing task.
The core idea is to make linear perturbations in the embedding space of the input via fast gradient methods (FGM) and train the model to keep the original and perturbed representations close via contrastive learning.
The results show not only an improvement in the quantitative (perplexity and BLEU) scores when compared to the baselines, but ATCL also achieves good qualitative results in the semantic level for both tasks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: For years, adversarial training has been extensively studied in natural
language processing (NLP) settings. The main goal is to make models robust so
that similar inputs derive in semantically similar outcomes, which is not a
trivial problem since there is no objective measure of semantic similarity in
language. Previous works use an external pre-trained NLP model to tackle this
challenge, introducing an extra training stage with huge memory consumption
during training. However, the recent popular approach of contrastive learning
in language processing hints a convenient way of obtaining such similarity
restrictions. The main advantage of the contrastive learning approach is that
it aims for similar data points to be mapped close to each other and further
from different ones in the representation space. In this work, we propose
adversarial training with contrastive learning (ATCL) to adversarially train a
language processing task using the benefits of contrastive learning. The core
idea is to make linear perturbations in the embedding space of the input via
fast gradient methods (FGM) and train the model to keep the original and
perturbed representations close via contrastive learning. In NLP experiments,
we applied ATCL to language modeling and neural machine translation tasks. The
results show not only an improvement in the quantitative (perplexity and BLEU)
scores when compared to the baselines, but ATCL also achieves good qualitative
results in the semantic level for both tasks without using a pre-trained model.
Related papers
- Pixel Sentence Representation Learning [67.4775296225521]
In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process.
We employ visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to be perceived as continuous.
Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision.
arXiv Detail & Related papers (2024-02-13T02:46:45Z) - DenoSent: A Denoising Objective for Self-Supervised Sentence
Representation Learning [59.4644086610381]
We propose a novel denoising objective that inherits from another perspective, i.e., the intra-sentence perspective.
By introducing both discrete and continuous noise, we generate noisy sentences and then train our model to restore them to their original form.
Our empirical evaluations demonstrate that this approach delivers competitive results on both semantic textual similarity (STS) and a wide range of transfer tasks.
arXiv Detail & Related papers (2024-01-24T17:48:45Z) - Bridging the Gap between Language Models and Cross-Lingual Sequence
Labeling [101.74165219364264]
Large-scale cross-lingual pre-trained language models (xPLMs) have shown effectiveness in cross-lingual sequence labeling tasks.
Despite the great success, we draw an empirical observation that there is a training objective gap between pre-training and fine-tuning stages.
In this paper, we first design a pre-training task tailored for xSL named Cross-lingual Language Informative Span Masking (CLISM) to eliminate the objective gap.
Second, we present ContrAstive-Consistency Regularization (CACR), which utilizes contrastive learning to encourage the consistency between representations of input parallel
arXiv Detail & Related papers (2022-04-11T15:55:20Z) - How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial
Robustness? [121.57551065856164]
We propose Robust Informative Fine-Tuning (RIFT) as a novel adversarial fine-tuning method from an information-theoretical perspective.
RIFT encourages an objective model to retain the features learned from the pre-trained model throughout the entire fine-tuning process.
Experimental results show that RIFT consistently outperforms the state-of-the-arts on two popular NLP tasks.
arXiv Detail & Related papers (2021-12-22T05:04:41Z) - Simple Contrastive Representation Adversarial Learning for NLP Tasks [17.12062566060011]
Two novel frameworks, supervised contrastive adversarial learning (SCAL) and unsupervised SCAL (USCAL), are proposed.
We employ it to Transformer-based models for natural language understanding, sentence semantic textual similarity and adversarial learning tasks.
Experimental results on GLUE benchmark tasks show that our fine-tuned supervised method outperforms BERT$_base$ over 1.75%.
arXiv Detail & Related papers (2021-11-26T03:16:09Z) - A Primer on Contrastive Pretraining in Language Processing: Methods,
Lessons Learned and Perspectives [22.933794444266596]
We describe recent self-supervised and supervised contrastive NLP pretraining methods.
We introduce key contrastive learning concepts with lessons learned from prior research and structure works by applications.
We point to open challenges and future directions for contrastive NLP to encourage bringing contrastive NLP pretraining closer to recent successes in image representation pretraining.
arXiv Detail & Related papers (2021-02-25T16:35:07Z) - TAVAT: Token-Aware Virtual Adversarial Training for Language
Understanding [55.16953347580948]
Gradient-based adversarial training is widely used in improving the robustness of neural networks.
It cannot be easily adapted to natural language processing tasks since the embedding space is discrete.
We propose a Token-Aware Virtual Adrial Training method to craft fine-grained perturbations.
arXiv Detail & Related papers (2020-04-30T02:03:24Z) - Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models
via Continual Learning [74.25168207651376]
Fine-tuning pre-trained language models to downstream cross-lingual tasks has shown promising results.
We leverage continual learning to preserve the cross-lingual ability of the pre-trained model when we fine-tune it to downstream tasks.
Our methods achieve better performance than other fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.
arXiv Detail & Related papers (2020-04-29T14:07:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.