Related papers: Adversarial Training with Contrastive Learning in NLP

Adversarial Training with Contrastive Learning in NLP

URL: http://arxiv.org/abs/2109.09075v1
Date: Sun, 19 Sep 2021 07:23:45 GMT
Title: Adversarial Training with Contrastive Learning in NLP
Authors: Daniela N. Rim, DongNyeong Heo, Heeyoul Choi
Abstract summary: We propose adversarial training with contrastive learning (ATCL) to adversarially train a language processing task. The core idea is to make linear perturbations in the embedding space of the input via fast gradient methods (FGM) and train the model to keep the original and perturbed representations close via contrastive learning. The results show not only an improvement in the quantitative (perplexity and BLEU) scores when compared to the baselines, but ATCL also achieves good qualitative results in the semantic level for both tasks.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: For years, adversarial training has been extensively studied in natural language processing (NLP) settings. The main goal is to make models robust so that similar inputs derive in semantically similar outcomes, which is not a trivial problem since there is no objective measure of semantic similarity in language. Previous works use an external pre-trained NLP model to tackle this challenge, introducing an extra training stage with huge memory consumption during training. However, the recent popular approach of contrastive learning in language processing hints a convenient way of obtaining such similarity restrictions. The main advantage of the contrastive learning approach is that it aims for similar data points to be mapped close to each other and further from different ones in the representation space. In this work, we propose adversarial training with contrastive learning (ATCL) to adversarially train a language processing task using the benefits of contrastive learning. The core idea is to make linear perturbations in the embedding space of the input via fast gradient methods (FGM) and train the model to keep the original and perturbed representations close via contrastive learning. In NLP experiments, we applied ATCL to language modeling and neural machine translation tasks. The results show not only an improvement in the quantitative (perplexity and BLEU) scores when compared to the baselines, but ATCL also achieves good qualitative results in the semantic level for both tasks without using a pre-trained model.

Related papers

Pixel Sentence Representation Learning [67.4775296225521]
In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process. We employ visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to be perceived as continuous. Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision.
arXiv Detail & Related papers (2024-02-13T02:46:45Z)
DenoSent: A Denoising Objective for Self-Supervised Sentence Representation Learning [59.4644086610381]
We propose a novel denoising objective that inherits from another perspective, i.e., the intra-sentence perspective. By introducing both discrete and continuous noise, we generate noisy sentences and then train our model to restore them to their original form. Our empirical evaluations demonstrate that this approach delivers competitive results on both semantic textual similarity (STS) and a wide range of transfer tasks.
arXiv Detail & Related papers (2024-01-24T17:48:45Z)
Bridging the Gap between Language Models and Cross-Lingual Sequence Labeling [101.74165219364264]
Large-scale cross-lingual pre-trained language models (xPLMs) have shown effectiveness in cross-lingual sequence labeling tasks. Despite the great success, we draw an empirical observation that there is a training objective gap between pre-training and fine-tuning stages. In this paper, we first design a pre-training task tailored for xSL named Cross-lingual Language Informative Span Masking (CLISM) to eliminate the objective gap. Second, we present ContrAstive-Consistency Regularization (CACR), which utilizes contrastive learning to encourage the consistency between representations of input parallel
arXiv Detail & Related papers (2022-04-11T15:55:20Z)
How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial Robustness? [121.57551065856164]
We propose Robust Informative Fine-Tuning (RIFT) as a novel adversarial fine-tuning method from an information-theoretical perspective. RIFT encourages an objective model to retain the features learned from the pre-trained model throughout the entire fine-tuning process. Experimental results show that RIFT consistently outperforms the state-of-the-arts on two popular NLP tasks.
arXiv Detail & Related papers (2021-12-22T05:04:41Z)
Simple Contrastive Representation Adversarial Learning for NLP Tasks [17.12062566060011]
Two novel frameworks, supervised contrastive adversarial learning (SCAL) and unsupervised SCAL (USCAL), are proposed. We employ it to Transformer-based models for natural language understanding, sentence semantic textual similarity and adversarial learning tasks. Experimental results on GLUE benchmark tasks show that our fine-tuned supervised method outperforms BERT$_base$ over 1.75%.
arXiv Detail & Related papers (2021-11-26T03:16:09Z)
A Primer on Contrastive Pretraining in Language Processing: Methods, Lessons Learned and Perspectives [22.933794444266596]
We describe recent self-supervised and supervised contrastive NLP pretraining methods. We introduce key contrastive learning concepts with lessons learned from prior research and structure works by applications. We point to open challenges and future directions for contrastive NLP to encourage bringing contrastive NLP pretraining closer to recent successes in image representation pretraining.
arXiv Detail & Related papers (2021-02-25T16:35:07Z)
TAVAT: Token-Aware Virtual Adversarial Training for Language Understanding [55.16953347580948]
Gradient-based adversarial training is widely used in improving the robustness of neural networks. It cannot be easily adapted to natural language processing tasks since the embedding space is discrete. We propose a Token-Aware Virtual Adrial Training method to craft fine-grained perturbations.
arXiv Detail & Related papers (2020-04-30T02:03:24Z)
Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models via Continual Learning [74.25168207651376]
Fine-tuning pre-trained language models to downstream cross-lingual tasks has shown promising results. We leverage continual learning to preserve the cross-lingual ability of the pre-trained model when we fine-tune it to downstream tasks. Our methods achieve better performance than other fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.
arXiv Detail & Related papers (2020-04-29T14:07:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.