Efficient Trigger Word Insertion
- URL: http://arxiv.org/abs/2311.13957v1
- Date: Thu, 23 Nov 2023 12:15:56 GMT
- Title: Efficient Trigger Word Insertion
- Authors: Yueqi Zeng, Ziqiang Li, Pengfei Xia, Lei Liu, Bin Li
- Abstract summary: Our main objective is to reduce the number of poisoned samples while still achieving a satisfactory Attack Success Rate (ASR) in text backdoor attacks.
We propose an efficient trigger word insertion strategy in terms of trigger word optimization and poisoned sample selection.
Our approach achieves an ASR of over 90% with only 10 poisoned samples in the dirty-label setting and requires merely 1.5% of the training data in the clean-label setting.
- Score: 9.257916713112945
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the boom in the natural language processing (NLP) field these years,
backdoor attacks pose immense threats against deep neural network models.
However, previous works hardly consider the effect of the poisoning rate. In
this paper, our main objective is to reduce the number of poisoned samples
while still achieving a satisfactory Attack Success Rate (ASR) in text backdoor
attacks. To accomplish this, we propose an efficient trigger word insertion
strategy in terms of trigger word optimization and poisoned sample selection.
Extensive experiments on different datasets and models demonstrate that our
proposed method can significantly improve attack effectiveness in text
classification tasks. Remarkably, our approach achieves an ASR of over 90% with
only 10 poisoned samples in the dirty-label setting and requires merely 1.5% of
the training data in the clean-label setting.
Related papers
- SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - Shortcuts Arising from Contrast: Effective and Covert Clean-Label Attacks in Prompt-Based Learning [40.130762098868736]
We propose a method named Contrastive Shortcut Injection (CSI), by leveraging activation values, integrates trigger design and data selection strategies to craft stronger shortcut features.
With extensive experiments on full-shot and few-shot text classification tasks, we empirically validate CSI's high effectiveness and high stealthiness at low poisoning rates.
arXiv Detail & Related papers (2024-03-30T20:02:36Z) - Progressive Poisoned Data Isolation for Training-time Backdoor Defense [23.955347169187917]
Deep Neural Networks (DNN) are susceptible to backdoor attacks where malicious attackers manipulate the model's predictions via data poisoning.
In this study, we present a novel and efficacious defense method, termed Progressive Isolation of Poisoned Data (PIPD)
Our PIPD achieves an average True Positive Rate (TPR) of 99.95% and an average False Positive Rate (FPR) of 0.06% for diverse attacks over CIFAR-10 dataset.
arXiv Detail & Related papers (2023-12-20T02:40:28Z) - Explore the Effect of Data Selection on Poison Efficiency in Backdoor
Attacks [10.817607451423765]
In this study, we focus on improving the poisoning efficiency of backdoor attacks from the sample selection perspective.
We adopt the forgetting events of the samples to indicate the contribution of different poisoned samples and use the curvature of the loss surface to analyses the effectiveness of this phenomenon.
arXiv Detail & Related papers (2023-10-15T05:55:23Z) - Confidence-driven Sampling for Backdoor Attacks [49.72680157684523]
Backdoor attacks aim to surreptitiously insert malicious triggers into DNN models, granting unauthorized control during testing scenarios.
Existing methods lack robustness against defense strategies and predominantly focus on enhancing trigger stealthiness while randomly selecting poisoned samples.
We introduce a straightforward yet highly effective sampling methodology that leverages confidence scores. Specifically, it selects samples with lower confidence scores, significantly increasing the challenge for defenders in identifying and countering these attacks.
arXiv Detail & Related papers (2023-10-08T18:57:36Z) - Not All Poisons are Created Equal: Robust Training against Data
Poisoning [15.761683760167777]
Data poisoning causes misclassification of test time target examples by injecting maliciously crafted samples in the training data.
We propose an efficient defense mechanism that significantly reduces the success rate of various data poisoning attacks.
arXiv Detail & Related papers (2022-10-18T08:19:41Z) - Versatile Weight Attack via Flipping Limited Bits [68.45224286690932]
We study a novel attack paradigm, which modifies model parameters in the deployment stage.
Considering the effectiveness and stealthiness goals, we provide a general formulation to perform the bit-flip based weight attack.
We present two cases of the general formulation with different malicious purposes, i.e., single sample attack (SSA) and triggered samples attack (TSA)
arXiv Detail & Related papers (2022-07-25T03:24:58Z) - Data-Efficient Backdoor Attacks [14.230326737098554]
Deep neural networks are vulnerable to backdoor attacks.
In this paper, we formulate improving the poisoned data efficiency by the selection.
The same attack success rate can be achieved with only 47% to 75% of the poisoned sample volume.
arXiv Detail & Related papers (2022-04-22T09:52:22Z) - Targeted Attack against Deep Neural Networks via Flipping Limited Weight
Bits [55.740716446995805]
We study a novel attack paradigm, which modifies model parameters in the deployment stage for malicious purposes.
Our goal is to misclassify a specific sample into a target class without any sample modification.
By utilizing the latest technique in integer programming, we equivalently reformulate this BIP problem as a continuous optimization problem.
arXiv Detail & Related papers (2021-02-21T03:13:27Z) - How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z) - BERT-ATTACK: Adversarial Attack Against BERT Using BERT [77.82947768158132]
Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images)
We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples.
Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
arXiv Detail & Related papers (2020-04-21T13:30:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.