Modeling Adversarial Attack on Pre-trained Language Models as Sequential
Decision Making
- URL: http://arxiv.org/abs/2305.17440v1
- Date: Sat, 27 May 2023 10:33:53 GMT
- Title: Modeling Adversarial Attack on Pre-trained Language Models as Sequential
Decision Making
- Authors: Xuanjie Fang, Sijie Cheng, Yang Liu, Wei Wang
- Abstract summary: adversarial attack task has found that pre-trained language models (PLMs) are vulnerable to small perturbations.
In this paper, we model the adversarial attack task on PLMs as a sequential decision-making problem.
We propose to use reinforcement learning to find an appropriate sequential attack path to generate adversaries, named SDM-Attack.
- Score: 10.425483543802846
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained language models (PLMs) have been widely used to underpin various
downstream tasks. However, the adversarial attack task has found that PLMs are
vulnerable to small perturbations. Mainstream methods adopt a detached
two-stage framework to attack without considering the subsequent influence of
substitution at each step. In this paper, we formally model the adversarial
attack task on PLMs as a sequential decision-making problem, where the whole
attack process is sequential with two decision-making problems, i.e., word
finder and word substitution. Considering the attack process can only receive
the final state without any direct intermediate signals, we propose to use
reinforcement learning to find an appropriate sequential attack path to
generate adversaries, named SDM-Attack. Extensive experimental results show
that SDM-Attack achieves the highest attack success rate with a comparable
modification rate and semantic similarity to attack fine-tuned BERT.
Furthermore, our analyses demonstrate the generalization and transferability of
SDM-Attack. The code is available at https://github.com/fduxuan/SDM-Attack.
Related papers
- Attack-in-the-Chain: Bootstrapping Large Language Models for Attacks Against Black-box Neural Ranking Models [111.58315434849047]
We introduce a novel ranking attack framework named Attack-in-the-Chain.
It tracks interactions between large language models (LLMs) and Neural ranking models (NRMs) based on chain-of-thought.
Empirical results on two web search benchmarks show the effectiveness of our method.
arXiv Detail & Related papers (2024-12-25T04:03:09Z) - Defensive Dual Masking for Robust Adversarial Defense [5.932787778915417]
This paper introduces the Defensive Dual Masking (DDM) algorithm, a novel approach designed to enhance model robustness against such attacks.
DDM utilizes a unique adversarial training strategy where [MASK] tokens are strategically inserted into training samples to prepare the model to handle adversarial perturbations more effectively.
During inference, potentially adversarial tokens are dynamically replaced with [MASK] tokens to neutralize potential threats while preserving the core semantics of the input.
arXiv Detail & Related papers (2024-12-10T00:41:25Z) - Learning to Learn Transferable Generative Attack for Person Re-Identification [17.26567195924685]
Existing attacks merely consider cross-dataset and cross-model transferability, ignoring the cross-test capability to perturb models trained in different domains.
To powerfully examine the robustness of real-world re-id models, the Meta Transferable Generative Attack (MTGA) method is proposed.
Our MTGA outperforms the SOTA methods by 21.5% and 11.3% on mean mAP drop rate, respectively.
arXiv Detail & Related papers (2024-09-06T11:57:17Z) - Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks [62.036798488144306]
Current defense mainly focuses on the known attacks, but the adversarial robustness to the unknown attacks is seriously overlooked.
We propose an attack-agnostic defense method named Meta Invariance Defense (MID)
We show that MID simultaneously achieves robustness to the imperceptible adversarial perturbations in high-level image classification and attack-suppression in low-level robust image regeneration.
arXiv Detail & Related papers (2024-04-04T10:10:38Z) - DTA: Distribution Transform-based Attack for Query-Limited Scenario [11.874670564015789]
In generating adversarial examples, the conventional black-box attack methods rely on sufficient feedback from the to-be-attacked models.
This paper proposes a hard-label attack that simulates an attacked action being permitted to conduct a limited number of queries.
Experiments validate the effectiveness of the proposed idea and the superiority of DTA over the state-of-the-art.
arXiv Detail & Related papers (2023-12-12T13:21:03Z) - DALA: A Distribution-Aware LoRA-Based Adversarial Attack against
Language Models [64.79319733514266]
Adversarial attacks can introduce subtle perturbations to input data.
Recent attack methods can achieve a relatively high attack success rate (ASR)
We propose a Distribution-Aware LoRA-based Adversarial Attack (DALA) method.
arXiv Detail & Related papers (2023-11-14T23:43:47Z) - Transferable Attack for Semantic Segmentation [59.17710830038692]
adversarial attacks, and observe that the adversarial examples generated from a source model fail to attack the target models.
We propose an ensemble attack for semantic segmentation to achieve more effective attacks with higher transferability.
arXiv Detail & Related papers (2023-07-31T11:05:55Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z) - Versatile Weight Attack via Flipping Limited Bits [68.45224286690932]
We study a novel attack paradigm, which modifies model parameters in the deployment stage.
Considering the effectiveness and stealthiness goals, we provide a general formulation to perform the bit-flip based weight attack.
We present two cases of the general formulation with different malicious purposes, i.e., single sample attack (SSA) and triggered samples attack (TSA)
arXiv Detail & Related papers (2022-07-25T03:24:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.