Self-Supervised Contrastive Learning with Adversarial Perturbations for
Robust Pretrained Language Models
- URL: http://arxiv.org/abs/2107.07610v1
- Date: Thu, 15 Jul 2021 21:03:34 GMT
- Title: Self-Supervised Contrastive Learning with Adversarial Perturbations for
Robust Pretrained Language Models
- Authors: Zhao Meng, Yihan Dong, Mrinmaya Sachan, Roger Wattenhofer
- Abstract summary: This paper improves the robustness of the pretrained language model BERT against word substitution-based adversarial attacks.
We also create an adversarial attack for word-level adversarial training on BERT.
- Score: 18.726529370845256
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper improves the robustness of the pretrained language model BERT
against word substitution-based adversarial attacks by leveraging
self-supervised contrastive learning with adversarial perturbations. One
advantage of our method compared to previous works is that it is capable of
improving model robustness without using any labels. Additionally, we also
create an adversarial attack for word-level adversarial training on BERT. The
attack is efficient, allowing adversarial training for BERT on adversarial
examples generated on the fly during training. Experimental results on four
datasets show that our method improves the robustness of BERT against four
different word substitution-based adversarial attacks. Furthermore, to
understand why our method can improve the model robustness against adversarial
attacks, we study vector representations of clean examples and their
corresponding adversarial examples before and after applying our method. As our
method improves model robustness with unlabeled raw data, it opens up the
possibility of using large text datasets to train robust language models.
Related papers
- Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks.
We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z) - SCAT: Robust Self-supervised Contrastive Learning via Adversarial
Training for Text Classification [15.932462099791307]
We propose a novel learning framework called SCAT (Self-supervised Contrastive Learning via Adversarial Training)
SCAT modifies random augmentations of the data in a fully labelfree manner to generate adversarial examples.
Our results show that SCAT can not only train robust language models from scratch, but it can also significantly improve the robustness of existing pre-trained language models.
arXiv Detail & Related papers (2023-07-04T05:41:31Z) - In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - Learning to Ignore Adversarial Attacks [14.24585085013907]
We introduce the use of rationale models that can explicitly learn to ignore attack tokens.
We find that the rationale models can successfully ignore over 90% of attack tokens.
arXiv Detail & Related papers (2022-05-23T18:01:30Z) - Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations.
All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z) - Improving Gradient-based Adversarial Training for Text Classification by
Contrastive Learning and Auto-Encoder [18.375585982984845]
We focus on enhancing the model's ability to defend gradient-based adversarial attack during the model's training process.
We propose two novel adversarial training approaches: CARL and RAR.
Experiments show that the proposed two approaches outperform strong baselines on various text classification datasets.
arXiv Detail & Related papers (2021-09-14T09:08:58Z) - Robust Pre-Training by Adversarial Contrastive Learning [120.33706897927391]
Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness.
We improve robustness-aware self-supervised pre-training by learning representations consistent under both data augmentations and adversarial perturbations.
arXiv Detail & Related papers (2020-10-26T04:44:43Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z) - BERT-ATTACK: Adversarial Attack Against BERT Using BERT [77.82947768158132]
Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images)
We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples.
Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
arXiv Detail & Related papers (2020-04-21T13:30:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.