Double Landmines: Invisible Textual Backdoor Attacks based on Dual-Trigger
- URL: http://arxiv.org/abs/2412.17531v1
- Date: Mon, 23 Dec 2024 12:56:30 GMT
- Title: Double Landmines: Invisible Textual Backdoor Attacks based on Dual-Trigger
- Authors: Yang Hou, Qiuling Yue, Lujia Chai, Guozhao Liao, Wenbao Han, Wei Ou,
- Abstract summary: This paper proposes a Dual-Trigger backdoor attack based on syntax and mood.<n>A large number of experimental results show that this method significantly outperforms the previous methods.
- Score: 1.586075842611725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: At present, all textual backdoor attack methods are based on single triggers: for example, inserting specific content into the text to activate the backdoor; or changing the abstract text features. The former is easier to be identified by existing defense strategies due to its obvious characteristics; the latter, although improved in invisibility, has certain shortcomings in terms of attack performance, construction of poisoned datasets, and selection of the final poisoning rate. On this basis, this paper innovatively proposes a Dual-Trigger backdoor attack based on syntax and mood, and optimizes the construction of the poisoned dataset and the selection strategy of the final poisoning rate. A large number of experimental results show that this method significantly outperforms the previous methods based on abstract features in attack performance, and achieves comparable attack performance (almost 100% attack success rate) with the insertion-based method. In addition, the two trigger mechanisms included in this method can be activated independently in the application phase of the model, which not only improves the flexibility of the trigger style, but also enhances its robustness against defense strategies. These results profoundly reveal that textual backdoor attacks are extremely harmful and provide a new perspective for security protection in this field.
Related papers
- PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization [13.751251342738225]
Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of applications.
They also exhibit inherent limitations, such as outdated knowledge and susceptibility to hallucinations.
Recent efforts have focused on the security of RAG-based LLMs, yet existing attack methods face three critical challenges.
We propose coordinated Prompt-RAG attack (PR-attack), a novel optimization-driven attack that introduces a small number of poisoned texts into the knowledge database.
arXiv Detail & Related papers (2025-04-10T13:09:50Z) - SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - Shortcuts Arising from Contrast: Effective and Covert Clean-Label Attacks in Prompt-Based Learning [40.130762098868736]
We propose a method named Contrastive Shortcut Injection (CSI), by leveraging activation values, integrates trigger design and data selection strategies to craft stronger shortcut features.
With extensive experiments on full-shot and few-shot text classification tasks, we empirically validate CSI's high effectiveness and high stealthiness at low poisoning rates.
arXiv Detail & Related papers (2024-03-30T20:02:36Z) - Mutual-modality Adversarial Attack with Semantic Perturbation [81.66172089175346]
We propose a novel approach that generates adversarial attacks in a mutual-modality optimization scheme.
Our approach outperforms state-of-the-art attack methods and can be readily deployed as a plug-and-play solution.
arXiv Detail & Related papers (2023-12-20T05:06:01Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z) - Stealthy Backdoor Attack via Confidence-driven Sampling [49.72680157684523]
Backdoor attacks aim to surreptitiously insert malicious triggers into DNN models, granting unauthorized control during testing scenarios.
Existing methods lack robustness against defense strategies and predominantly focus on enhancing trigger stealthiness while randomly selecting poisoned samples.
We introduce a straightforward yet highly effective sampling methodology that leverages confidence scores. Specifically, it selects samples with lower confidence scores, significantly increasing the challenge for defenders in identifying and countering these attacks.
arXiv Detail & Related papers (2023-10-08T18:57:36Z) - Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in
Language Models [41.1058288041033]
We propose ProAttack, a novel and efficient method for performing clean-label backdoor attacks based on the prompt.
Our method does not require external triggers and ensures correct labeling of poisoned samples, improving the stealthy nature of the backdoor attack.
arXiv Detail & Related papers (2023-05-02T06:19:36Z) - ChatGPT as an Attack Tool: Stealthy Textual Backdoor Attack via Blackbox
Generative Model Trigger [11.622811907571132]
Textual backdoor attacks pose a practical threat to existing systems.
With cutting-edge generative models such as GPT-4 pushing rewriting to extraordinary levels, such attacks are becoming even harder to detect.
We conduct a comprehensive investigation of the role of black-box generative models as a backdoor attack tool, highlighting the importance of researching relative defense strategies.
arXiv Detail & Related papers (2023-04-27T19:26:25Z) - Versatile Weight Attack via Flipping Limited Bits [68.45224286690932]
We study a novel attack paradigm, which modifies model parameters in the deployment stage.
Considering the effectiveness and stealthiness goals, we provide a general formulation to perform the bit-flip based weight attack.
We present two cases of the general formulation with different malicious purposes, i.e., single sample attack (SSA) and triggered samples attack (TSA)
arXiv Detail & Related papers (2022-07-25T03:24:58Z) - Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger [48.59965356276387]
We propose to use syntactic structure as the trigger in textual backdoor attacks.
We conduct extensive experiments to demonstrate that the trigger-based attack method can achieve comparable attack performance.
These results also reveal the significant insidiousness and harmfulness of textual backdoor attacks.
arXiv Detail & Related papers (2021-05-26T08:54:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.