Better Robustness by More Coverage: Adversarial Training with Mixup
Augmentation for Robust Fine-tuning
- URL: http://arxiv.org/abs/2012.15699v1
- Date: Thu, 31 Dec 2020 16:28:07 GMT
- Title: Better Robustness by More Coverage: Adversarial Training with Mixup
Augmentation for Robust Fine-tuning
- Authors: Chenglei Si, Zhengyan Zhang, Fanchao Qi, Zhiyuan Liu, Yasheng Wang,
Qun Liu, Maosong Sun
- Abstract summary: Adversarial data augmentation (ADA) has been widely adopted, which attempts to cover more search space of adversarial attacks by adding the adversarial examples during training.
We propose a simple and effective method to cover much larger proportion of the attack search space, called Adversarial Data Augmentation with Mixup (MixADA)
In the text classification experiments of BERT and RoBERTa, MixADA achieves significant robustness gains under two strong adversarial attacks and alleviates the performance of ADA on the original data.
- Score: 69.65361463168142
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pre-trained language models (PLMs) fail miserably on adversarial attacks. To
improve the robustness, adversarial data augmentation (ADA) has been widely
adopted, which attempts to cover more search space of adversarial attacks by
adding the adversarial examples during training. However, the number of
adversarial examples added by ADA is extremely insufficient due to the
enormously large search space. In this work, we propose a simple and effective
method to cover much larger proportion of the attack search space, called
Adversarial Data Augmentation with Mixup (MixADA). Specifically, MixADA
linearly interpolates the representations of pairs of training examples to form
new virtual samples, which are more abundant and diverse than the discrete
adversarial examples used in conventional ADA. Moreover, to evaluate the
robustness of different models fairly, we adopt a challenging setup, which
dynamically generates new adversarial examples for each model. In the text
classification experiments of BERT and RoBERTa, MixADA achieves significant
robustness gains under two strong adversarial attacks and alleviates the
performance degradation of ADA on the original data. Our source codes will be
released to support further explorations.
Related papers
- MOREL: Enhancing Adversarial Robustness through Multi-Objective Representation Learning [1.534667887016089]
deep neural networks (DNNs) are vulnerable to slight adversarial perturbations.
We show that strong feature representation learning during training can significantly enhance the original model's robustness.
We propose MOREL, a multi-objective feature representation learning approach, encouraging classification models to produce similar features for inputs within the same class, despite perturbations.
arXiv Detail & Related papers (2024-10-02T16:05:03Z) - Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks [62.036798488144306]
Current defense mainly focuses on the known attacks, but the adversarial robustness to the unknown attacks is seriously overlooked.
We propose an attack-agnostic defense method named Meta Invariance Defense (MID)
We show that MID simultaneously achieves robustness to the imperceptible adversarial perturbations in high-level image classification and attack-suppression in low-level robust image regeneration.
arXiv Detail & Related papers (2024-04-04T10:10:38Z) - Adapters Mixup: Mixing Parameter-Efficient Adapters to Enhance the Adversarial Robustness of Fine-tuned Pre-trained Text Classifiers [9.250758784663411]
AdpMixup combines fine-tuning through adapters and adversarial augmentation via mixup to dynamically leverage existing knowledge for robust inference.
Experiments show AdpMixup achieves the best trade-off between training efficiency and robustness under both pre-known and unknown attacks.
arXiv Detail & Related papers (2024-01-18T16:27:18Z) - Fast Propagation is Better: Accelerating Single-Step Adversarial
Training via Sampling Subnetworks [69.54774045493227]
A drawback of adversarial training is the computational overhead introduced by the generation of adversarial examples.
We propose to exploit the interior building blocks of the model to improve efficiency.
Compared with previous methods, our method not only reduces the training cost but also achieves better model robustness.
arXiv Detail & Related papers (2023-10-24T01:36:20Z) - Advancing Adversarial Robustness Through Adversarial Logit Update [10.041289551532804]
Adversarial training and adversarial purification are among the most widely recognized defense strategies.
We propose a new principle, namely Adversarial Logit Update (ALU), to infer adversarial sample's labels.
Our solution achieves superior performance compared to state-of-the-art methods against a wide range of adversarial attacks.
arXiv Detail & Related papers (2023-08-29T07:13:31Z) - PIAT: Parameter Interpolation based Adversarial Training for Image
Classification [19.276850361815953]
We propose a novel framework, termed Interpolation based Adversarial Training (PIAT), that makes full use of the historical information during training.
Our framework is general and could further boost the robust accuracy when combined with other adversarial training methods.
arXiv Detail & Related papers (2023-03-24T12:22:34Z) - Latent Boundary-guided Adversarial Training [61.43040235982727]
Adrial training is proved to be the most effective strategy that injects adversarial examples into model training.
We propose a novel adversarial training framework called LAtent bounDary-guided aDvErsarial tRaining.
arXiv Detail & Related papers (2022-06-08T07:40:55Z) - On the Impact of Hard Adversarial Instances on Overfitting in
Adversarial Training [72.95029777394186]
Adversarial training is a popular method to robustify models against adversarial attacks.
We investigate this phenomenon from the perspective of training instances.
We show that the decay in generalization performance of adversarial training is a result of the model's attempt to fit hard adversarial instances.
arXiv Detail & Related papers (2021-12-14T12:19:24Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z) - Generalizing Adversarial Examples by AdaBelief Optimizer [6.243028964381449]
We propose an AdaBelief iterative Fast Gradient Sign Method to generalize adversarial examples.
Compared with state-of-the-art attack methods, our proposed method can generate adversarial examples effectively in the white-box setting.
The transfer rate is 7%-21% higher than latest attack methods.
arXiv Detail & Related papers (2021-01-25T07:39:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.