Improving the Robustness of Transformer-based Large Language Models with
Dynamic Attention
- URL: http://arxiv.org/abs/2311.17400v2
- Date: Thu, 30 Nov 2023 02:08:24 GMT
- Title: Improving the Robustness of Transformer-based Large Language Models with
Dynamic Attention
- Authors: Lujia Shen, Yuwen Pu, Shouling Ji, Changjiang Li, Xuhong Zhang,
Chunpeng Ge and Ting Wang
- Abstract summary: Transformer-based models, such as BERT and GPT, have been widely adopted in natural language processing (NLP)
Recent studies show their vulnerability to textual adversarial attacks where the model's output can be misled by intentionally manipulating the text inputs.
We propose a novel method called dynamic attention, tailored for the transformer architecture, to enhance the inherent robustness of the model itself against various adversarial attacks.
- Score: 43.95101492654236
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer-based models, such as BERT and GPT, have been widely adopted in
natural language processing (NLP) due to their exceptional performance.
However, recent studies show their vulnerability to textual adversarial attacks
where the model's output can be misled by intentionally manipulating the text
inputs. Despite various methods that have been proposed to enhance the model's
robustness and mitigate this vulnerability, many require heavy consumption
resources (e.g., adversarial training) or only provide limited protection
(e.g., defensive dropout). In this paper, we propose a novel method called
dynamic attention, tailored for the transformer architecture, to enhance the
inherent robustness of the model itself against various adversarial attacks.
Our method requires no downstream task knowledge and does not incur additional
costs. The proposed dynamic attention consists of two modules: (I) attention
rectification, which masks or weakens the attention value of the chosen tokens,
and (ii) dynamic modeling, which dynamically builds the set of candidate
tokens. Extensive experiments demonstrate that dynamic attention significantly
mitigates the impact of adversarial attacks, improving up to 33\% better
performance than previous methods against widely-used adversarial attacks. The
model-level design of dynamic attention enables it to be easily combined with
other defense methods (e.g., adversarial training) to further enhance the
model's robustness. Furthermore, we demonstrate that dynamic attention
preserves the state-of-the-art robustness space of the original model compared
to other dynamic modeling methods.
Related papers
- QuantAttack: Exploiting Dynamic Quantization to Attack Vision
Transformers [29.957089564635083]
We present QuantAttack, a novel attack that targets the availability of quantized models.
We show that carefully crafted adversarial examples, which are designed to exhaust the resources of the operating system, can trigger worst-case performance.
arXiv Detail & Related papers (2023-12-03T18:31:19Z) - Evaluating Concurrent Robustness of Language Models Across Diverse Challenge Sets [46.19529338280716]
Language models, characterized by their black-box nature, often hallucinate and display sensitivity to input perturbations.
We introduce a methodology designed to examine how input perturbations affect language models across various scales.
We present three distinct fine-tuning strategies to address robustness against multiple perturbations.
arXiv Detail & Related papers (2023-11-15T02:59:10Z) - Introducing Foundation Models as Surrogate Models: Advancing Towards
More Practical Adversarial Attacks [15.882687207499373]
No-box adversarial attacks are becoming more practical and challenging for AI systems.
This paper recasts adversarial attack as a downstream task by introducing foundational models as surrogate models.
arXiv Detail & Related papers (2023-07-13T08:10:48Z) - DST: Dynamic Substitute Training for Data-free Black-box Attack [79.61601742693713]
We propose a novel dynamic substitute training attack method to encourage substitute model to learn better and faster from the target model.
We introduce a task-driven graph-based structure information learning constrain to improve the quality of generated training data.
arXiv Detail & Related papers (2022-04-03T02:29:11Z) - Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks.
We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z) - Clustering Effect of (Linearized) Adversarial Robust Models [60.25668525218051]
We propose a novel understanding of adversarial robustness and apply it on more tasks including domain adaption and robustness boosting.
Experimental evaluations demonstrate the rationality and superiority of our proposed clustering strategy.
arXiv Detail & Related papers (2021-11-25T05:51:03Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z) - Evaluating Deception Detection Model Robustness To Linguistic Variation [10.131671217810581]
We propose an analysis of model robustness against linguistic variation in the setting of deceptive news detection.
We consider two prediction tasks and compare three state-of-the-art embeddings to highlight consistent trends in model performance.
We find that character or mixed ensemble models are the most effective defenses and that character perturbation-based attack tactics are more successful.
arXiv Detail & Related papers (2021-04-23T17:25:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.