Adversarial Clean Label Backdoor Attacks and Defenses on Text
Classification Systems
- URL: http://arxiv.org/abs/2305.19607v1
- Date: Wed, 31 May 2023 07:23:46 GMT
- Title: Adversarial Clean Label Backdoor Attacks and Defenses on Text
Classification Systems
- Authors: Ashim Gupta, Amrith Krishna
- Abstract summary: Clean-label (CL) attacks are relatively unexplored in NLP.
CL attacks are more resilient to data sanitization and manual relabeling methods than label flipping (LF) attacks.
We show that an adversary can significantly bring down the data requirements for a CL attack to as low as 20% of the data otherwise required.
- Score: 23.201773332458693
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Clean-label (CL) attack is a form of data poisoning attack where an adversary
modifies only the textual input of the training data, without requiring access
to the labeling function. CL attacks are relatively unexplored in NLP, as
compared to label flipping (LF) attacks, where the latter additionally requires
access to the labeling function as well. While CL attacks are more resilient to
data sanitization and manual relabeling methods than LF attacks, they often
demand as high as ten times the poisoning budget than LF attacks. In this work,
we first introduce an Adversarial Clean Label attack which can adversarially
perturb in-class training examples for poisoning the training set. We then show
that an adversary can significantly bring down the data requirements for a CL
attack, using the aforementioned approach, to as low as 20% of the data
otherwise required. We then systematically benchmark and analyze a number of
defense methods, for both LF and CL attacks, some previously employed solely
for LF attacks in the textual domain and others adapted from computer vision.
We find that text-specific defenses greatly vary in their effectiveness
depending on their properties.
Related papers
- FCert: Certifiably Robust Few-Shot Classification in the Era of Foundation Models [38.019489232264796]
We propose FCert, the first certified defense against data poisoning attacks to few-shot classification.
Our experimental results show our FCert: 1) maintains classification accuracy without attacks, 2) outperforms existing certified defenses for data poisoning attacks, and 3) is efficient and general.
arXiv Detail & Related papers (2024-04-12T17:50:40Z) - Diffusion Denoising as a Certified Defense against Clean-label Poisoning [56.04951180983087]
We show how an off-the-shelf diffusion model can sanitize the tampered training data.
We extensively test our defense against seven clean-label poisoning attacks and reduce their attack success to 0-16% with only a negligible drop in the test time accuracy.
arXiv Detail & Related papers (2024-03-18T17:17:07Z) - Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks [62.34019142949628]
Typographic Attacks, which involve pasting misleading text onto an image, were noted to harm the performance of Vision-Language Models like CLIP.
We introduce two novel and more effective textitSelf-Generated attacks which prompt the LVLM to generate an attack against itself.
Using our benchmark, we uncover that Self-Generated attacks pose a significant threat, reducing LVLM(s) classification performance by up to 33%.
arXiv Detail & Related papers (2024-02-01T14:41:20Z) - Large Language Models Are Better Adversaries: Exploring Generative
Clean-Label Backdoor Attacks Against Text Classifiers [25.94356063000699]
Backdoor attacks manipulate model predictions by inserting innocuous triggers into training and test data.
We focus on more realistic and more challenging clean-label attacks where the adversarial training examples are correctly labeled.
Our attack, LLMBkd, leverages language models to automatically insert diverse style-based triggers into texts.
arXiv Detail & Related papers (2023-10-28T06:11:07Z) - Fast Adversarial Label-Flipping Attack on Tabular Data [4.4989885299224515]
In label-flipping attacks, the adversary maliciously flips a portion of training labels to compromise the machine learning model.
This paper raises significant concerns as these attacks can camouflage a highly skewed dataset as an easily solvable classification problem.
We propose FALFA, a novel efficient attack for crafting adversarial labels.
arXiv Detail & Related papers (2023-10-16T18:20:44Z) - Adversarial Training with Complementary Labels: On the Benefit of
Gradually Informative Attacks [119.38992029332883]
Adversarial training with imperfect supervision is significant but receives limited attention.
We propose a new learning strategy using gradually informative attacks.
Experiments are conducted to demonstrate the effectiveness of our method on a range of benchmarked datasets.
arXiv Detail & Related papers (2022-11-01T04:26:45Z) - Narcissus: A Practical Clean-Label Backdoor Attack with Limited
Information [22.98039177091884]
"Clean-label" backdoor attacks require knowledge of the entire training set to be effective.
This paper provides an algorithm to mount clean-label backdoor attacks based only on the knowledge of representative examples from the target class.
Our attack works well across datasets and models, even when the trigger presents in the physical world.
arXiv Detail & Related papers (2022-04-11T16:58:04Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z) - Label-Only Membership Inference Attacks [67.46072950620247]
We introduce label-only membership inference attacks.
Our attacks evaluate the robustness of a model's predicted labels under perturbations.
We find that training models with differential privacy and (strong) L2 regularization are the only known defense strategies.
arXiv Detail & Related papers (2020-07-28T15:44:31Z) - Headless Horseman: Adversarial Attacks on Transfer Learning Models [69.13927986055553]
We present a family of transferable adversarial attacks against such classifiers.
We first demonstrate successful transfer attacks against a victim network using textitonly its feature extractor.
This motivates the introduction of a label-blind adversarial attack.
Our attack lowers the accuracy of a ResNet18 trained on CIFAR10 by over 40%.
arXiv Detail & Related papers (2020-04-20T01:07:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.