Modeling Adversarial Attack on Pre-trained Language Models as Sequential
Decision Making
- URL: http://arxiv.org/abs/2305.17440v1
- Date: Sat, 27 May 2023 10:33:53 GMT
- Title: Modeling Adversarial Attack on Pre-trained Language Models as Sequential
Decision Making
- Authors: Xuanjie Fang, Sijie Cheng, Yang Liu, Wei Wang
- Abstract summary: adversarial attack task has found that pre-trained language models (PLMs) are vulnerable to small perturbations.
In this paper, we model the adversarial attack task on PLMs as a sequential decision-making problem.
We propose to use reinforcement learning to find an appropriate sequential attack path to generate adversaries, named SDM-Attack.
- Score: 10.425483543802846
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained language models (PLMs) have been widely used to underpin various
downstream tasks. However, the adversarial attack task has found that PLMs are
vulnerable to small perturbations. Mainstream methods adopt a detached
two-stage framework to attack without considering the subsequent influence of
substitution at each step. In this paper, we formally model the adversarial
attack task on PLMs as a sequential decision-making problem, where the whole
attack process is sequential with two decision-making problems, i.e., word
finder and word substitution. Considering the attack process can only receive
the final state without any direct intermediate signals, we propose to use
reinforcement learning to find an appropriate sequential attack path to
generate adversaries, named SDM-Attack. Extensive experimental results show
that SDM-Attack achieves the highest attack success rate with a comparable
modification rate and semantic similarity to attack fine-tuned BERT.
Furthermore, our analyses demonstrate the generalization and transferability of
SDM-Attack. The code is available at https://github.com/fduxuan/SDM-Attack.
Related papers
- Learning to Learn Transferable Generative Attack for Person Re-Identification [17.26567195924685]
Existing attacks merely consider cross-dataset and cross-model transferability, ignoring the cross-test capability to perturb models trained in different domains.
To powerfully examine the robustness of real-world re-id models, the Meta Transferable Generative Attack (MTGA) method is proposed.
Our MTGA outperforms the SOTA methods by 21.5% and 11.3% on mean mAP drop rate, respectively.
arXiv Detail & Related papers (2024-09-06T11:57:17Z) - Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks [62.036798488144306]
Current defense mainly focuses on the known attacks, but the adversarial robustness to the unknown attacks is seriously overlooked.
We propose an attack-agnostic defense method named Meta Invariance Defense (MID)
We show that MID simultaneously achieves robustness to the imperceptible adversarial perturbations in high-level image classification and attack-suppression in low-level robust image regeneration.
arXiv Detail & Related papers (2024-04-04T10:10:38Z) - Multi-granular Adversarial Attacks against Black-box Neural Ranking Models [111.58315434849047]
We create high-quality adversarial examples by incorporating multi-granular perturbations.
We transform the multi-granular attack into a sequential decision-making process.
Our attack method surpasses prevailing baselines in both attack effectiveness and imperceptibility.
arXiv Detail & Related papers (2024-04-02T02:08:29Z) - DTA: Distribution Transform-based Attack for Query-Limited Scenario [11.874670564015789]
In generating adversarial examples, the conventional black-box attack methods rely on sufficient feedback from the to-be-attacked models.
This paper proposes a hard-label attack that simulates an attacked action being permitted to conduct a limited number of queries.
Experiments validate the effectiveness of the proposed idea and the superiority of DTA over the state-of-the-art.
arXiv Detail & Related papers (2023-12-12T13:21:03Z) - DALA: A Distribution-Aware LoRA-Based Adversarial Attack against
Language Models [64.79319733514266]
Adversarial attacks can introduce subtle perturbations to input data.
Recent attack methods can achieve a relatively high attack success rate (ASR)
We propose a Distribution-Aware LoRA-based Adversarial Attack (DALA) method.
arXiv Detail & Related papers (2023-11-14T23:43:47Z) - Transferable Attack for Semantic Segmentation [59.17710830038692]
adversarial attacks, and observe that the adversarial examples generated from a source model fail to attack the target models.
We propose an ensemble attack for semantic segmentation to achieve more effective attacks with higher transferability.
arXiv Detail & Related papers (2023-07-31T11:05:55Z) - UOR: Universal Backdoor Attacks on Pre-trained Language Models [9.968755838867178]
Most existing backdoor attacks against pre-trained language models (PLMs) are un-targeted and task-specific.
We first summarize the requirements that a more threatening backdoor attack against PLMs should satisfy, and then propose a new backdoor attack method called UOR.
Specifically, we define poisoned supervised contrastive learning which can automatically learn the more uniform and universal output representations of triggers for various PLMs.
arXiv Detail & Related papers (2023-05-16T16:11:48Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z) - Versatile Weight Attack via Flipping Limited Bits [68.45224286690932]
We study a novel attack paradigm, which modifies model parameters in the deployment stage.
Considering the effectiveness and stealthiness goals, we provide a general formulation to perform the bit-flip based weight attack.
We present two cases of the general formulation with different malicious purposes, i.e., single sample attack (SSA) and triggered samples attack (TSA)
arXiv Detail & Related papers (2022-07-25T03:24:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.