Adapters Mixup: Mixing Parameter-Efficient Adapters to Enhance the Adversarial Robustness of Fine-tuned Pre-trained Text Classifiers
- URL: http://arxiv.org/abs/2401.10111v2
- Date: Mon, 17 Jun 2024 12:54:18 GMT
- Title: Adapters Mixup: Mixing Parameter-Efficient Adapters to Enhance the Adversarial Robustness of Fine-tuned Pre-trained Text Classifiers
- Authors: Tuc Nguyen, Thai Le,
- Abstract summary: AdpMixup combines fine-tuning through adapters and adversarial augmentation via mixup to dynamically leverage existing knowledge for robust inference.
Experiments show AdpMixup achieves the best trade-off between training efficiency and robustness under both pre-known and unknown attacks.
- Score: 9.250758784663411
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing works show that augmenting the training data of pre-trained language models (PLMs) for classification tasks fine-tuned via parameter-efficient fine-tuning methods (PEFT) using both clean and adversarial examples can enhance their robustness under adversarial attacks. However, this adversarial training paradigm often leads to performance degradation on clean inputs and requires frequent re-training on the entire data to account for new, unknown attacks. To overcome these challenges while still harnessing the benefits of adversarial training and the efficiency of PEFT, this work proposes a novel approach, called AdpMixup, that combines two paradigms: (1) fine-tuning through adapters and (2) adversarial augmentation via mixup to dynamically leverage existing knowledge from a set of pre-known attacks for robust inference. Intuitively, AdpMixup fine-tunes PLMs with multiple adapters with both clean and pre-known adversarial examples and intelligently mixes them up in different ratios during prediction. Our experiments show AdpMixup achieves the best trade-off between training efficiency and robustness under both pre-known and unknown attacks, compared to existing baselines on five downstream tasks across six varied black-box attacks and 2 PLMs. All source code will be available.
Related papers
- Efficient Adversarial Training in LLMs with Continuous Attacks [99.5882845458567]
Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails.
We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses.
C-AdvIPO is an adversarial variant of IPO that does not require utility data for adversarially robust alignment.
arXiv Detail & Related papers (2024-05-24T14:20:09Z) - PIAT: Parameter Interpolation based Adversarial Training for Image
Classification [19.276850361815953]
We propose a novel framework, termed Interpolation based Adversarial Training (PIAT), that makes full use of the historical information during training.
Our framework is general and could further boost the robust accuracy when combined with other adversarial training methods.
arXiv Detail & Related papers (2023-03-24T12:22:34Z) - Masking and Mixing Adversarial Training [9.690454593095495]
Adversarial training is a popular and straightforward technique to defend against the threat of adversarial examples.
CNNs must sacrifice the accuracy of standard samples to improve robustness against adversarial examples.
We propose Masking and Mixing Adversarial Training (M2AT) to mitigate the trade-off between accuracy and robustness.
arXiv Detail & Related papers (2023-02-16T04:05:53Z) - Distributed Adversarial Training to Robustify Deep Neural Networks at
Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training.
We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z) - Enhancing Adversarial Training with Feature Separability [52.39305978984573]
We introduce a new concept of adversarial training graph (ATG) with which the proposed adversarial training with feature separability (ATFS) enables to boost the intra-class feature similarity and increase inter-class feature variance.
Through comprehensive experiments, we demonstrate that the proposed ATFS framework significantly improves both clean and robust performance.
arXiv Detail & Related papers (2022-05-02T04:04:23Z) - Guided Interpolation for Adversarial Training [73.91493448651306]
As training progresses, the training data becomes less and less attackable, undermining the robustness enhancement.
We propose the guided framework (GIF), which employs the previous epoch's meta information to guide the data's adversarial variants.
Compared with the vanilla mixup, the GIF can provide a higher ratio of attackable data, which is beneficial to the robustness enhancement.
arXiv Detail & Related papers (2021-02-15T03:55:08Z) - Better Robustness by More Coverage: Adversarial Training with Mixup
Augmentation for Robust Fine-tuning [69.65361463168142]
Adversarial data augmentation (ADA) has been widely adopted, which attempts to cover more search space of adversarial attacks by adding the adversarial examples during training.
We propose a simple and effective method to cover much larger proportion of the attack search space, called Adversarial Data Augmentation with Mixup (MixADA)
In the text classification experiments of BERT and RoBERTa, MixADA achieves significant robustness gains under two strong adversarial attacks and alleviates the performance of ADA on the original data.
arXiv Detail & Related papers (2020-12-31T16:28:07Z) - Robust Pre-Training by Adversarial Contrastive Learning [120.33706897927391]
Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness.
We improve robustness-aware self-supervised pre-training by learning representations consistent under both data augmentations and adversarial perturbations.
arXiv Detail & Related papers (2020-10-26T04:44:43Z) - Towards Rapid and Robust Adversarial Training with One-Step Attacks [0.0]
Adversarial training is the most successful method for increasing the robustness of neural networks against adversarial attacks.
We present two ideas that enable adversarial training with the computationally less expensive Fast Gradient Sign Method.
We show that noise injection in conjunction with FGSM-based adversarial training achieves comparable results to adversarial training with PGD while being considerably faster.
arXiv Detail & Related papers (2020-02-24T07:28:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.