Related papers: Improving the Robustness of Transformer-based Large Language Models with Dynamic Attention

Improving the Robustness of Transformer-based Large Language Models with Dynamic Attention

URL: http://arxiv.org/abs/2311.17400v2
Date: Thu, 30 Nov 2023 02:08:24 GMT
Title: Improving the Robustness of Transformer-based Large Language Models with Dynamic Attention
Authors: Lujia Shen, Yuwen Pu, Shouling Ji, Changjiang Li, Xuhong Zhang, Chunpeng Ge and Ting Wang
Abstract summary: Transformer-based models, such as BERT and GPT, have been widely adopted in natural language processing (NLP) Recent studies show their vulnerability to textual adversarial attacks where the model's output can be misled by intentionally manipulating the text inputs. We propose a novel method called dynamic attention, tailored for the transformer architecture, to enhance the inherent robustness of the model itself against various adversarial attacks.
Score: 43.95101492654236
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformer-based models, such as BERT and GPT, have been widely adopted in natural language processing (NLP) due to their exceptional performance. However, recent studies show their vulnerability to textual adversarial attacks where the model's output can be misled by intentionally manipulating the text inputs. Despite various methods that have been proposed to enhance the model's robustness and mitigate this vulnerability, many require heavy consumption resources (e.g., adversarial training) or only provide limited protection (e.g., defensive dropout). In this paper, we propose a novel method called dynamic attention, tailored for the transformer architecture, to enhance the inherent robustness of the model itself against various adversarial attacks. Our method requires no downstream task knowledge and does not incur additional costs. The proposed dynamic attention consists of two modules: (I) attention rectification, which masks or weakens the attention value of the chosen tokens, and (ii) dynamic modeling, which dynamically builds the set of candidate tokens. Extensive experiments demonstrate that dynamic attention significantly mitigates the impact of adversarial attacks, improving up to 33\% better performance than previous methods against widely-used adversarial attacks. The model-level design of dynamic attention enables it to be easily combined with other defense methods (e.g., adversarial training) to further enhance the model's robustness. Furthermore, we demonstrate that dynamic attention preserves the state-of-the-art robustness space of the original model compared to other dynamic modeling methods.

Related papers

Reformulation is All You Need: Addressing Malicious Text Features in DNNs [53.45564571192014]
We propose a unified and adaptive defense framework that is effective against both adversarial and backdoor attacks.<n>Our framework outperforms existing sample-oriented defense baselines across a diverse range of malicious textual features.
arXiv Detail & Related papers (2025-02-02T03:39:43Z)
Adversarial Robustness through Dynamic Ensemble Learning [0.0]
Adversarial attacks pose a significant threat to the reliability of pre-trained language models (PLMs) This paper presents Adversarial Robustness through Dynamic Ensemble Learning (ARDEL), a novel scheme designed to enhance the robustness of PLMs against such attacks.
arXiv Detail & Related papers (2024-12-20T05:36:19Z)
Defensive Dual Masking for Robust Adversarial Defense [5.932787778915417]
This paper introduces the Defensive Dual Masking (DDM) algorithm, a novel approach designed to enhance model robustness against such attacks. DDM utilizes a unique adversarial training strategy where [MASK] tokens are strategically inserted into training samples to prepare the model to handle adversarial perturbations more effectively. During inference, potentially adversarial tokens are dynamically replaced with [MASK] tokens to neutralize potential threats while preserving the core semantics of the input.
arXiv Detail & Related papers (2024-12-10T00:41:25Z)
QuantAttack: Exploiting Dynamic Quantization to Attack Vision Transformers [29.957089564635083]
We present QuantAttack, a novel attack that targets the availability of quantized models. We show that carefully crafted adversarial examples, which are designed to exhaust the resources of the operating system, can trigger worst-case performance.
arXiv Detail & Related papers (2023-12-03T18:31:19Z)
Evaluating Concurrent Robustness of Language Models Across Diverse Challenge Sets [46.19529338280716]
Language models, characterized by their black-box nature, often hallucinate and display sensitivity to input perturbations. We introduce a methodology designed to examine how input perturbations affect language models across various scales. We present three distinct fine-tuning strategies to address robustness against multiple perturbations.
arXiv Detail & Related papers (2023-11-15T02:59:10Z)
Introducing Foundation Models as Surrogate Models: Advancing Towards More Practical Adversarial Attacks [15.882687207499373]
No-box adversarial attacks are becoming more practical and challenging for AI systems. This paper recasts adversarial attack as a downstream task by introducing foundational models as surrogate models.
arXiv Detail & Related papers (2023-07-13T08:10:48Z)
DST: Dynamic Substitute Training for Data-free Black-box Attack [79.61601742693713]
We propose a novel dynamic substitute training attack method to encourage substitute model to learn better and faster from the target model. We introduce a task-driven graph-based structure information learning constrain to improve the quality of generated training data.
arXiv Detail & Related papers (2022-04-03T02:29:11Z)
Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks. We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z)
Clustering Effect of (Linearized) Adversarial Robust Models [60.25668525218051]
We propose a novel understanding of adversarial robustness and apply it on more tasks including domain adaption and robustness boosting. Experimental evaluations demonstrate the rationality and superiority of our proposed clustering strategy.
arXiv Detail & Related papers (2021-11-25T05:51:03Z)
Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications. We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths. Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z)
Evaluating Deception Detection Model Robustness To Linguistic Variation [10.131671217810581]
We propose an analysis of model robustness against linguistic variation in the setting of deceptive news detection. We consider two prediction tasks and compare three state-of-the-art embeddings to highlight consistent trends in model performance. We find that character or mixed ensemble models are the most effective defenses and that character perturbation-based attack tactics are more successful.
arXiv Detail & Related papers (2021-04-23T17:25:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.