Diffusion Denoising as a Certified Defense against Clean-label Poisoning
- URL: http://arxiv.org/abs/2403.11981v1
- Date: Mon, 18 Mar 2024 17:17:07 GMT
- Title: Diffusion Denoising as a Certified Defense against Clean-label Poisoning
- Authors: Sanghyun Hong, Nicholas Carlini, Alexey Kurakin,
- Abstract summary: We show how an off-the-shelf diffusion model can sanitize the tampered training data.
We extensively test our defense against seven clean-label poisoning attacks and reduce their attack success to 0-16% with only a negligible drop in the test time accuracy.
- Score: 56.04951180983087
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a certified defense to clean-label poisoning attacks. These attacks work by injecting a small number of poisoning samples (e.g., 1%) that contain $p$-norm bounded adversarial perturbations into the training data to induce a targeted misclassification of a test-time input. Inspired by the adversarial robustness achieved by $denoised$ $smoothing$, we show how an off-the-shelf diffusion model can sanitize the tampered training data. We extensively test our defense against seven clean-label poisoning attacks and reduce their attack success to 0-16% with only a negligible drop in the test time accuracy. We compare our defense with existing countermeasures against clean-label poisoning, showing that the defense reduces the attack success the most and offers the best model utility. Our results highlight the need for future work on developing stronger clean-label attacks and using our certified yet practical defense as a strong baseline to evaluate these attacks.
Related papers
- Clean Label Attacks against SLU Systems [33.69363383366988]
We adapted clean label backdoor (CLBD)-data poisoning attacks, which do not modify the training labels, on state-of-the-art speech recognition models.
We analyzed how varying the signal-strength of the poison, percent of samples poisoned, and choice of trigger impact the attack.
We found that CLBD attacks are most successful when applied to training samples that are inherently hard for a proxy model.
arXiv Detail & Related papers (2024-09-13T16:58:06Z) - Wicked Oddities: Selectively Poisoning for Effective Clean-Label Backdoor Attacks [11.390175856652856]
Clean-label attacks are a more stealthy form of backdoor attacks that can perform the attack without changing the labels of poisoned data.
We study different strategies for selectively poisoning a small set of training samples in the target class to boost the attack success rate.
Our threat model poses a serious threat in training machine learning models with third-party datasets.
arXiv Detail & Related papers (2024-07-15T15:38:21Z) - SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - Pick your Poison: Undetectability versus Robustness in Data Poisoning
Attacks [33.82164201455115]
Deep image classification models trained on vast amounts of web-scraped data are susceptible to data poisoning.
Existing work considers an effective defense as one that either (i) restores a model's integrity through repair or (ii) detects an attack.
We argue that this approach overlooks a crucial trade-off: Attackers can increase at the expense of detectability (over-poisoning) or decrease detectability at the cost of robustness (under-poisoning)
arXiv Detail & Related papers (2023-05-07T15:58:06Z) - PoisonedEncoder: Poisoning the Unlabeled Pre-training Data in
Contrastive Learning [69.70602220716718]
We propose PoisonedEncoder, a data poisoning attack to contrastive learning.
In particular, an attacker injects carefully crafted poisoning inputs into the unlabeled pre-training data.
We evaluate five defenses against PoisonedEncoder, including one pre-processing, three in-processing, and one post-processing defenses.
arXiv Detail & Related papers (2022-05-13T00:15:44Z) - What Doesn't Kill You Makes You Robust(er): Adversarial Training against
Poisons and Backdoors [57.040948169155925]
We extend the adversarial training framework to defend against (training-time) poisoning and backdoor attacks.
Our method desensitizes networks to the effects of poisoning by creating poisons during training and injecting them into training batches.
We show that this defense withstands adaptive attacks, generalizes to diverse threat models, and incurs a better performance trade-off than previous defenses.
arXiv Detail & Related papers (2021-02-26T17:54:36Z) - Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching [56.280018325419896]
Data Poisoning attacks modify training data to maliciously control a model trained on such data.
We analyze a particularly malicious poisoning attack that is both "from scratch" and "clean label"
We show that it is the first poisoning method to cause targeted misclassification in modern deep networks trained from scratch on a full-sized, poisoned ImageNet dataset.
arXiv Detail & Related papers (2020-09-04T16:17:54Z) - Deep Partition Aggregation: Provable Defense against General Poisoning
Attacks [136.79415677706612]
Adrial poisoning attacks distort training data in order to corrupt the test-time behavior of a classifier.
We propose two novel provable defenses against poisoning attacks.
DPA is a certified defense against a general poisoning threat model.
SS-DPA is a certified defense against label-flipping attacks.
arXiv Detail & Related papers (2020-06-26T03:16:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.