Invisible Textual Backdoor Attacks based on Dual-Trigger
- URL: http://arxiv.org/abs/2412.17531v3
- Date: Fri, 18 Jul 2025 02:44:07 GMT
- Title: Invisible Textual Backdoor Attacks based on Dual-Trigger
- Authors: Yang Hou, Qiuling Yue, Lujia Chai, Guozhao Liao, Wenbao Han, Wei Ou,
- Abstract summary: This paper proposes a dual-trigger backdoor attack method.<n>Specifically, we use two different attributes, syntax and mood, as two different triggers.<n>It makes our backdoor attack method similar to a double landmine which can have completely different trigger conditions simultaneously.
- Score: 1.586075842611725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Backdoor attacks pose an important security threat to textual large language models. Exploring textual backdoor attacks not only helps reveal the potential security risks of models, but also promotes innovation and development of defense mechanisms. Currently, most textual backdoor attack methods are based on a single trigger. For example, inserting specific content into text as a trigger or changing the abstract text features to be a trigger. However, the adoption of this single-trigger mode makes the existing backdoor attacks subject to certain limitations: either they are easily identified by the existing defense strategies, or they have certain shortcomings in attack performance and in the construction of poisoned datasets. In order to solve these issues, a dual-trigger backdoor attack method is proposed in this paper. Specifically, we use two different attributes, syntax and mood (we use subjunctive mood as an example in this article), as two different triggers. It makes our backdoor attack method similar to a double landmine which can have completely different trigger conditions simultaneously. Therefore, this method not only improves the flexibility of trigger mode, but also enhances the robustness against defense detection. A large number of experimental results show that this method significantly outperforms the previous methods based on abstract features in attack performance, and achieves comparable attack performance (almost 100\% attack success rate) with the insertion-based method. In addition, in order to further improve the attack performance, we also give the construction method of the poisoned dataset.The code and data of this paper can be obtained at https://github.com/HoyaAm/Double-Landmines.
Related papers
- PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization [13.751251342738225]
Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of applications.
They also exhibit inherent limitations, such as outdated knowledge and susceptibility to hallucinations.
Recent efforts have focused on the security of RAG-based LLMs, yet existing attack methods face three critical challenges.
We propose coordinated Prompt-RAG attack (PR-attack), a novel optimization-driven attack that introduces a small number of poisoned texts into the knowledge database.
arXiv Detail & Related papers (2025-04-10T13:09:50Z) - A4O: All Trigger for One sample [10.78460062665304]
We show that proposed backdoor defenders often rely on the assumption that triggers would appear in a unified way.<n>In this paper, we show that this naive assumption can create a loophole, allowing more sophisticated backdoor attacks to bypass.<n>We design a novel backdoor attack mechanism that incorporates multiple types of backdoor triggers, focusing on stealthiness and effectiveness.
arXiv Detail & Related papers (2025-01-13T10:38:58Z) - SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - Dual Model Replacement:invisible Multi-target Backdoor Attack based on Federal Learning [21.600003684064706]
This paper designs a backdoor attack method based on federated learning.
aiming at the concealment of the backdoor trigger, a TrojanGan steganography model with encoder-decoder structure is designed.
A dual model replacement backdoor attack algorithm based on federated learning is designed.
arXiv Detail & Related papers (2024-04-22T07:44:02Z) - Shortcuts Arising from Contrast: Effective and Covert Clean-Label Attacks in Prompt-Based Learning [40.130762098868736]
We propose a method named Contrastive Shortcut Injection (CSI), by leveraging activation values, integrates trigger design and data selection strategies to craft stronger shortcut features.
With extensive experiments on full-shot and few-shot text classification tasks, we empirically validate CSI's high effectiveness and high stealthiness at low poisoning rates.
arXiv Detail & Related papers (2024-03-30T20:02:36Z) - LOTUS: Evasive and Resilient Backdoor Attacks through Sub-Partitioning [49.174341192722615]
Backdoor attack poses a significant security threat to Deep Learning applications.
Recent papers have introduced attacks using sample-specific invisible triggers crafted through special transformation functions.
We introduce a novel backdoor attack LOTUS to address both evasiveness and resilience.
arXiv Detail & Related papers (2024-03-25T21:01:29Z) - Shortcuts Everywhere and Nowhere: Exploring Multi-Trigger Backdoor Attacks [64.68741192761726]
Backdoor attacks have become a significant threat to the pre-training and deployment of deep neural networks (DNNs)<n>In this study, we explore the concept of Multi-Trigger Backdoor Attacks (MTBAs), where multiple adversaries leverage different types of triggers to poison the same dataset.
arXiv Detail & Related papers (2024-01-27T04:49:37Z) - Mutual-modality Adversarial Attack with Semantic Perturbation [81.66172089175346]
We propose a novel approach that generates adversarial attacks in a mutual-modality optimization scheme.
Our approach outperforms state-of-the-art attack methods and can be readily deployed as a plug-and-play solution.
arXiv Detail & Related papers (2023-12-20T05:06:01Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z) - Attention-Enhancing Backdoor Attacks Against BERT-based Models [54.070555070629105]
Investigating the strategies of backdoor attacks will help to understand the model's vulnerability.
We propose a novel Trojan Attention Loss (TAL) which enhances the Trojan behavior by directly manipulating the attention patterns.
arXiv Detail & Related papers (2023-10-23T01:24:56Z) - Stealthy Backdoor Attack via Confidence-driven Sampling [49.72680157684523]
Backdoor attacks aim to surreptitiously insert malicious triggers into DNN models, granting unauthorized control during testing scenarios.
Existing methods lack robustness against defense strategies and predominantly focus on enhancing trigger stealthiness while randomly selecting poisoned samples.
We introduce a straightforward yet highly effective sampling methodology that leverages confidence scores. Specifically, it selects samples with lower confidence scores, significantly increasing the challenge for defenders in identifying and countering these attacks.
arXiv Detail & Related papers (2023-10-08T18:57:36Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in
Language Models [41.1058288041033]
We propose ProAttack, a novel and efficient method for performing clean-label backdoor attacks based on the prompt.
Our method does not require external triggers and ensures correct labeling of poisoned samples, improving the stealthy nature of the backdoor attack.
arXiv Detail & Related papers (2023-05-02T06:19:36Z) - ChatGPT as an Attack Tool: Stealthy Textual Backdoor Attack via Blackbox
Generative Model Trigger [11.622811907571132]
Textual backdoor attacks pose a practical threat to existing systems.
With cutting-edge generative models such as GPT-4 pushing rewriting to extraordinary levels, such attacks are becoming even harder to detect.
We conduct a comprehensive investigation of the role of black-box generative models as a backdoor attack tool, highlighting the importance of researching relative defense strategies.
arXiv Detail & Related papers (2023-04-27T19:26:25Z) - Versatile Weight Attack via Flipping Limited Bits [68.45224286690932]
We study a novel attack paradigm, which modifies model parameters in the deployment stage.
Considering the effectiveness and stealthiness goals, we provide a general formulation to perform the bit-flip based weight attack.
We present two cases of the general formulation with different malicious purposes, i.e., single sample attack (SSA) and triggered samples attack (TSA)
arXiv Detail & Related papers (2022-07-25T03:24:58Z) - Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger [48.59965356276387]
We propose to use syntactic structure as the trigger in textual backdoor attacks.
We conduct extensive experiments to demonstrate that the trigger-based attack method can achieve comparable attack performance.
These results also reveal the significant insidiousness and harmfulness of textual backdoor attacks.
arXiv Detail & Related papers (2021-05-26T08:54:19Z) - Rethinking the Trigger of Backdoor Attack [83.98031510668619]
Currently, most of existing backdoor attacks adopted the setting of emphstatic trigger, $i.e.,$ triggers across the training and testing images follow the same appearance and are located in the same area.
We demonstrate that such an attack paradigm is vulnerable when the trigger in testing images is not consistent with the one used for training.
arXiv Detail & Related papers (2020-04-09T17:19:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.