Related papers: Fast Adversarial Training against Textual Adversarial Attacks

Fast Adversarial Training against Textual Adversarial Attacks

URL: http://arxiv.org/abs/2401.12461v1
Date: Tue, 23 Jan 2024 03:03:57 GMT
Title: Fast Adversarial Training against Textual Adversarial Attacks
Authors: Yichen Yang, Xin Liu, Kun He
Abstract summary: We propose a Fast Adversarial Training (FAT) method to improve the model robustness in the synonym-unaware scenario. FAT uses single-step and multi-step gradient ascent to craft adversarial examples in the embedding space. Experiments demonstrate that FAT significantly boosts the robustness of BERT models in the synonym-unaware scenario.
Score: 11.023035222098008
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many adversarial defense methods have been proposed to enhance the adversarial robustness of natural language processing models. However, most of them introduce additional pre-set linguistic knowledge and assume that the synonym candidates used by attackers are accessible, which is an ideal assumption. We delve into adversarial training in the embedding space and propose a Fast Adversarial Training (FAT) method to improve the model robustness in the synonym-unaware scenario from the perspective of single-step perturbation generation and perturbation initialization. Based on the observation that the adversarial perturbations crafted by single-step and multi-step gradient ascent are similar, FAT uses single-step gradient ascent to craft adversarial examples in the embedding space to expedite the training process. Based on the observation that the perturbations generated on the identical training sample in successive epochs are similar, FAT fully utilizes historical information when initializing the perturbation. Extensive experiments demonstrate that FAT significantly boosts the robustness of BERT models in the synonym-unaware scenario, and outperforms the defense baselines under various attacks with character-level and word-level modifications.

Related papers

SCAT: Robust Self-supervised Contrastive Learning via Adversarial Training for Text Classification [15.932462099791307]
We propose a novel learning framework called SCAT (Self-supervised Contrastive Learning via Adversarial Training) SCAT modifies random augmentations of the data in a fully labelfree manner to generate adversarial examples. Our results show that SCAT can not only train robust language models from scratch, but it can also significantly improve the robustness of existing pre-trained language models.
arXiv Detail & Related papers (2023-07-04T05:41:31Z)
Improving Fast Adversarial Training with Prior-Guided Knowledge [80.52575209189365]
We investigate the relationship between adversarial example quality and catastrophic overfitting by comparing the training processes of standard adversarial training and Fast adversarial training. We find that catastrophic overfitting occurs when the attack success rate of adversarial examples becomes worse.
arXiv Detail & Related papers (2023-04-01T02:18:12Z)
PIAT: Parameter Interpolation based Adversarial Training for Image Classification [19.276850361815953]
We propose a novel framework, termed Interpolation based Adversarial Training (PIAT), that makes full use of the historical information during training. Our framework is general and could further boost the robust accuracy when combined with other adversarial training methods.
arXiv Detail & Related papers (2023-03-24T12:22:34Z)
In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks. Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks. We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z)
Prior-Guided Adversarial Initialization for Fast Adversarial Training [84.56377396106447]
We investigate the difference between the training processes of adversarial examples (AEs) of Fast adversarial training (FAT) and standard adversarial training (SAT) We observe that the attack success rate of adversarial examples (AEs) of FAT gets worse gradually in the late training stage, resulting in overfitting. Based on the observation, we propose a prior-guided FGSM initialization method to avoid overfitting. The proposed method can prevent catastrophic overfitting and outperform state-of-the-art FAT methods.
arXiv Detail & Related papers (2022-07-18T18:13:10Z)
Robust Textual Embedding against Word-level Adversarial Attacks [15.235449552083043]
We propose a novel robust training method, termed Fast Triplet Metric Learning (FTML) We show that FTML can significantly promote the model robustness against various advanced adversarial attacks. Our work shows the great potential of improving the textual robustness through robust word embedding.
arXiv Detail & Related papers (2022-02-28T14:25:00Z)
How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial Robustness? [121.57551065856164]
We propose Robust Informative Fine-Tuning (RIFT) as a novel adversarial fine-tuning method from an information-theoretical perspective. RIFT encourages an objective model to retain the features learned from the pre-trained model throughout the entire fine-tuning process. Experimental results show that RIFT consistently outperforms the state-of-the-arts on two popular NLP tasks.
arXiv Detail & Related papers (2021-12-22T05:04:41Z)
Towards Robustness Against Natural Language Word Substitutions [87.56898475512703]
Robustness against word substitutions has a well-defined and widely acceptable form, using semantically similar words as substitutions. Previous defense methods capture word substitutions in vector space by using either $l$-ball or hyper-rectangle.
arXiv Detail & Related papers (2021-07-28T17:55:08Z)
Self-Supervised Contrastive Learning with Adversarial Perturbations for Robust Pretrained Language Models [18.726529370845256]
This paper improves the robustness of the pretrained language model BERT against word substitution-based adversarial attacks. We also create an adversarial attack for word-level adversarial training on BERT.
arXiv Detail & Related papers (2021-07-15T21:03:34Z)
Self-Progressing Robust Training [146.8337017922058]
Current robust training methods such as adversarial training explicitly uses an "attack" to generate adversarial examples. We propose a new framework called SPROUT, self-progressing robust training. Our results shed new light on scalable, effective and attack-independent robust training methods.
arXiv Detail & Related papers (2020-12-22T00:45:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.