Fast Adversarial Training against Textual Adversarial Attacks
- URL: http://arxiv.org/abs/2401.12461v1
- Date: Tue, 23 Jan 2024 03:03:57 GMT
- Title: Fast Adversarial Training against Textual Adversarial Attacks
- Authors: Yichen Yang, Xin Liu, Kun He
- Abstract summary: We propose a Fast Adversarial Training (FAT) method to improve the model robustness in the synonym-unaware scenario.
FAT uses single-step and multi-step gradient ascent to craft adversarial examples in the embedding space.
Experiments demonstrate that FAT significantly boosts the robustness of BERT models in the synonym-unaware scenario.
- Score: 11.023035222098008
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many adversarial defense methods have been proposed to enhance the
adversarial robustness of natural language processing models. However, most of
them introduce additional pre-set linguistic knowledge and assume that the
synonym candidates used by attackers are accessible, which is an ideal
assumption. We delve into adversarial training in the embedding space and
propose a Fast Adversarial Training (FAT) method to improve the model
robustness in the synonym-unaware scenario from the perspective of single-step
perturbation generation and perturbation initialization. Based on the
observation that the adversarial perturbations crafted by single-step and
multi-step gradient ascent are similar, FAT uses single-step gradient ascent to
craft adversarial examples in the embedding space to expedite the training
process. Based on the observation that the perturbations generated on the
identical training sample in successive epochs are similar, FAT fully utilizes
historical information when initializing the perturbation. Extensive
experiments demonstrate that FAT significantly boosts the robustness of BERT
models in the synonym-unaware scenario, and outperforms the defense baselines
under various attacks with character-level and word-level modifications.
Related papers
- SCAT: Robust Self-supervised Contrastive Learning via Adversarial
Training for Text Classification [15.932462099791307]
We propose a novel learning framework called SCAT (Self-supervised Contrastive Learning via Adversarial Training)
SCAT modifies random augmentations of the data in a fully labelfree manner to generate adversarial examples.
Our results show that SCAT can not only train robust language models from scratch, but it can also significantly improve the robustness of existing pre-trained language models.
arXiv Detail & Related papers (2023-07-04T05:41:31Z) - Improving Fast Adversarial Training with Prior-Guided Knowledge [80.52575209189365]
We investigate the relationship between adversarial example quality and catastrophic overfitting by comparing the training processes of standard adversarial training and Fast adversarial training.
We find that catastrophic overfitting occurs when the attack success rate of adversarial examples becomes worse.
arXiv Detail & Related papers (2023-04-01T02:18:12Z) - PIAT: Parameter Interpolation based Adversarial Training for Image
Classification [19.276850361815953]
We propose a novel framework, termed Interpolation based Adversarial Training (PIAT), that makes full use of the historical information during training.
Our framework is general and could further boost the robust accuracy when combined with other adversarial training methods.
arXiv Detail & Related papers (2023-03-24T12:22:34Z) - In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - Prior-Guided Adversarial Initialization for Fast Adversarial Training [84.56377396106447]
We investigate the difference between the training processes of adversarial examples (AEs) of Fast adversarial training (FAT) and standard adversarial training (SAT)
We observe that the attack success rate of adversarial examples (AEs) of FAT gets worse gradually in the late training stage, resulting in overfitting.
Based on the observation, we propose a prior-guided FGSM initialization method to avoid overfitting.
The proposed method can prevent catastrophic overfitting and outperform state-of-the-art FAT methods.
arXiv Detail & Related papers (2022-07-18T18:13:10Z) - Robust Textual Embedding against Word-level Adversarial Attacks [15.235449552083043]
We propose a novel robust training method, termed Fast Triplet Metric Learning (FTML)
We show that FTML can significantly promote the model robustness against various advanced adversarial attacks.
Our work shows the great potential of improving the textual robustness through robust word embedding.
arXiv Detail & Related papers (2022-02-28T14:25:00Z) - How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial
Robustness? [121.57551065856164]
We propose Robust Informative Fine-Tuning (RIFT) as a novel adversarial fine-tuning method from an information-theoretical perspective.
RIFT encourages an objective model to retain the features learned from the pre-trained model throughout the entire fine-tuning process.
Experimental results show that RIFT consistently outperforms the state-of-the-arts on two popular NLP tasks.
arXiv Detail & Related papers (2021-12-22T05:04:41Z) - Towards Robustness Against Natural Language Word Substitutions [87.56898475512703]
Robustness against word substitutions has a well-defined and widely acceptable form, using semantically similar words as substitutions.
Previous defense methods capture word substitutions in vector space by using either $l$-ball or hyper-rectangle.
arXiv Detail & Related papers (2021-07-28T17:55:08Z) - Self-Supervised Contrastive Learning with Adversarial Perturbations for
Robust Pretrained Language Models [18.726529370845256]
This paper improves the robustness of the pretrained language model BERT against word substitution-based adversarial attacks.
We also create an adversarial attack for word-level adversarial training on BERT.
arXiv Detail & Related papers (2021-07-15T21:03:34Z) - Self-Progressing Robust Training [146.8337017922058]
Current robust training methods such as adversarial training explicitly uses an "attack" to generate adversarial examples.
We propose a new framework called SPROUT, self-progressing robust training.
Our results shed new light on scalable, effective and attack-independent robust training methods.
arXiv Detail & Related papers (2020-12-22T00:45:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.