SCAT: Robust Self-supervised Contrastive Learning via Adversarial
Training for Text Classification
- URL: http://arxiv.org/abs/2307.01488v1
- Date: Tue, 4 Jul 2023 05:41:31 GMT
- Title: SCAT: Robust Self-supervised Contrastive Learning via Adversarial
Training for Text Classification
- Authors: Junjie Wu, Dit-Yan Yeung
- Abstract summary: We propose a novel learning framework called SCAT (Self-supervised Contrastive Learning via Adversarial Training)
SCAT modifies random augmentations of the data in a fully labelfree manner to generate adversarial examples.
Our results show that SCAT can not only train robust language models from scratch, but it can also significantly improve the robustness of existing pre-trained language models.
- Score: 15.932462099791307
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite their promising performance across various natural language
processing (NLP) tasks, current NLP systems are vulnerable to textual
adversarial attacks. To defend against these attacks, most existing methods
apply adversarial training by incorporating adversarial examples. However,
these methods have to rely on ground-truth labels to generate adversarial
examples, rendering it impractical for large-scale model pre-training which is
commonly used nowadays for NLP and many other tasks. In this paper, we propose
a novel learning framework called SCAT (Self-supervised Contrastive Learning
via Adversarial Training), which can learn robust representations without
requiring labeled data. Specifically, SCAT modifies random augmentations of the
data in a fully labelfree manner to generate adversarial examples. Adversarial
training is achieved by minimizing the contrastive loss between the
augmentations and their adversarial counterparts. We evaluate SCAT on two text
classification datasets using two state-of-the-art attack schemes proposed
recently. Our results show that SCAT can not only train robust language models
from scratch, but it can also significantly improve the robustness of existing
pre-trained language models. Moreover, to demonstrate its flexibility, we show
that SCAT can also be combined with supervised adversarial training to further
enhance model robustness.
Related papers
- Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks.
We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z) - Fast Adversarial Training against Textual Adversarial Attacks [11.023035222098008]
We propose a Fast Adversarial Training (FAT) method to improve the model robustness in the synonym-unaware scenario.
FAT uses single-step and multi-step gradient ascent to craft adversarial examples in the embedding space.
Experiments demonstrate that FAT significantly boosts the robustness of BERT models in the synonym-unaware scenario.
arXiv Detail & Related papers (2024-01-23T03:03:57Z) - CAT:Collaborative Adversarial Training [80.55910008355505]
We propose a collaborative adversarial training framework to improve the robustness of neural networks.
Specifically, we use different adversarial training methods to train robust models and let models interact with their knowledge during the training process.
Cat achieves state-of-the-art adversarial robustness without using any additional data on CIFAR-10 under the Auto-Attack benchmark.
arXiv Detail & Related papers (2023-03-27T05:37:43Z) - Contrastive Learning for Fair Representations [50.95604482330149]
Trained classification models can unintentionally lead to biased representations and predictions.
Existing debiasing methods for classification models, such as adversarial training, are often expensive to train and difficult to optimise.
We propose a method for mitigating bias by incorporating contrastive learning, in which instances sharing the same class label are encouraged to have similar representations.
arXiv Detail & Related papers (2021-09-22T10:47:51Z) - Self-Supervised Contrastive Learning with Adversarial Perturbations for
Robust Pretrained Language Models [18.726529370845256]
This paper improves the robustness of the pretrained language model BERT against word substitution-based adversarial attacks.
We also create an adversarial attack for word-level adversarial training on BERT.
arXiv Detail & Related papers (2021-07-15T21:03:34Z) - Robust Pre-Training by Adversarial Contrastive Learning [120.33706897927391]
Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness.
We improve robustness-aware self-supervised pre-training by learning representations consistent under both data augmentations and adversarial perturbations.
arXiv Detail & Related papers (2020-10-26T04:44:43Z) - CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial
Text Generation [20.27052525082402]
We present a Controlled Adversarial Text Generation (CAT-Gen) model that generates adversarial texts through controllable attributes.
Experiments on real-world NLP datasets demonstrate that our method can generate more diverse and fluent adversarial texts.
arXiv Detail & Related papers (2020-10-05T21:07:45Z) - Defense against Adversarial Attacks in NLP via Dirichlet Neighborhood
Ensemble [163.3333439344695]
Dirichlet Neighborhood Ensemble (DNE) is a randomized smoothing method for training a robust model to defense substitution-based attacks.
DNE forms virtual sentences by sampling embedding vectors for each word in an input sentence from a convex hull spanned by the word and its synonyms, and it augments them with the training data.
We demonstrate through extensive experimentation that our method consistently outperforms recently proposed defense methods by a significant margin across different network architectures and multiple data sets.
arXiv Detail & Related papers (2020-06-20T18:01:16Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.