Related papers: Impact of Adversarial Training on Robustness and Generalizability of Language Models

Impact of Adversarial Training on Robustness and Generalizability of Language Models

URL: http://arxiv.org/abs/2211.05523v3
Date: Sun, 10 Dec 2023 08:57:08 GMT
Title: Impact of Adversarial Training on Robustness and Generalizability of Language Models
Authors: Enes Altinisik, Hassan Sajjad, Husrev Taha Sencar, Safa Messaoud, Sanjay Chawla
Abstract summary: This work provides an in depth comparison of different approaches for adversarial training in language models. Our findings suggest that better robustness can be achieved by pre-training data augmentation or by training with input space perturbation. A linguistic correlation analysis of neurons of the learned models reveals that the improved generalization is due to'more specialized' neurons.
Score: 33.790145748360686
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Adversarial training is widely acknowledged as the most effective defense against adversarial attacks. However, it is also well established that achieving both robustness and generalization in adversarially trained models involves a trade-off. The goal of this work is to provide an in depth comparison of different approaches for adversarial training in language models. Specifically, we study the effect of pre-training data augmentation as well as training time input perturbations vs. embedding space perturbations on the robustness and generalization of transformer-based language models. Our findings suggest that better robustness can be achieved by pre-training data augmentation or by training with input space perturbation. However, training with embedding space perturbation significantly improves generalization. A linguistic correlation analysis of neurons of the learned models reveals that the improved generalization is due to 'more specialized' neurons. To the best of our knowledge, this is the first work to carry out a deep qualitative analysis of different methods of generating adversarial examples in adversarial training of language models.

Related papers

Combining Adversaries with Anti-adversaries in Training [9.43429549718968]
Adversarial training is an effective technique to improve the robustness of deep neural networks. We study the influence of adversarial training on deep learning models in terms of fairness, robustness, and generalization.
arXiv Detail & Related papers (2023-04-25T03:34:35Z)
Self-Ensemble Adversarial Training for Improved Robustness [14.244311026737666]
Adversarial training is the strongest strategy against various adversarial attacks among all sorts of defense methods. Recent works mainly focus on developing new loss functions or regularizers, attempting to find the unique optimal point in the weight space. We devise a simple but powerful emphSelf-Ensemble Adversarial Training (SEAT) method for yielding a robust classifier by averaging weights of history models.
arXiv Detail & Related papers (2022-03-18T01:12:18Z)
How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial Robustness? [121.57551065856164]
We propose Robust Informative Fine-Tuning (RIFT) as a novel adversarial fine-tuning method from an information-theoretical perspective. RIFT encourages an objective model to retain the features learned from the pre-trained model throughout the entire fine-tuning process. Experimental results show that RIFT consistently outperforms the state-of-the-arts on two popular NLP tasks.
arXiv Detail & Related papers (2021-12-22T05:04:41Z)
On the Impact of Hard Adversarial Instances on Overfitting in Adversarial Training [72.95029777394186]
Adversarial training is a popular method to robustify models against adversarial attacks. We investigate this phenomenon from the perspective of training instances. We show that the decay in generalization performance of adversarial training is a result of the model's attempt to fit hard adversarial instances.
arXiv Detail & Related papers (2021-12-14T12:19:24Z)
Evaluating Deception Detection Model Robustness To Linguistic Variation [10.131671217810581]
We propose an analysis of model robustness against linguistic variation in the setting of deceptive news detection. We consider two prediction tasks and compare three state-of-the-art embeddings to highlight consistent trends in model performance. We find that character or mixed ensemble models are the most effective defenses and that character perturbation-based attack tactics are more successful.
arXiv Detail & Related papers (2021-04-23T17:25:38Z)
Stylized Adversarial Defense [105.88250594033053]
adversarial training creates perturbation patterns and includes them in the training set to robustify the model. We propose to exploit additional information from the feature space to craft stronger adversaries. Our adversarial training approach demonstrates strong robustness compared to state-of-the-art defenses.
arXiv Detail & Related papers (2020-07-29T08:38:10Z)
Adversarial Training for Large Neural Language Models [107.84290922621163]
We show that adversarial pre-training can improve both generalization and robustness. ALUM regularizes the training objective by applying perturbations in the embedding space that maximizes the adversarial loss. ALUM can be further combined with task-specific fine-tuning to attain additional gains.
arXiv Detail & Related papers (2020-04-20T00:07:18Z)
Adversarial Augmentation Policy Search for Domain and Cross-Lingual Generalization in Reading Comprehension [96.62963688510035]
Reading comprehension models often overfit to nuances of training datasets and fail at adversarial evaluation. We present several effective adversaries and automated data augmentation policy search methods with the goal of making reading comprehension models more robust to adversarial evaluation.
arXiv Detail & Related papers (2020-04-13T17:20:08Z)
Precise Tradeoffs in Adversarial Training for Linear Regression [55.764306209771405]
We provide a precise and comprehensive understanding of the role of adversarial training in the context of linear regression with Gaussian features. We precisely characterize the standard/robust accuracy and the corresponding tradeoff achieved by a contemporary mini-max adversarial training approach. Our theory for adversarial training algorithms also facilitates the rigorous study of how a variety of factors (size and quality of training data, model overparametrization etc.) affect the tradeoff between these two competing accuracies.
arXiv Detail & Related papers (2020-02-24T19:01:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.