What Doesn't Kill You Makes You Robust(er): Adversarial Training against
Poisons and Backdoors
- URL: http://arxiv.org/abs/2102.13624v1
- Date: Fri, 26 Feb 2021 17:54:36 GMT
- Title: What Doesn't Kill You Makes You Robust(er): Adversarial Training against
Poisons and Backdoors
- Authors: Jonas Geiping, Liam Fowl, Gowthami Somepalli, Micah Goldblum, Michael
Moeller, Tom Goldstein
- Abstract summary: We extend the adversarial training framework to defend against (training-time) poisoning and backdoor attacks.
Our method desensitizes networks to the effects of poisoning by creating poisons during training and injecting them into training batches.
We show that this defense withstands adaptive attacks, generalizes to diverse threat models, and incurs a better performance trade-off than previous defenses.
- Score: 57.040948169155925
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data poisoning is a threat model in which a malicious actor tampers with
training data to manipulate outcomes at inference time. A variety of defenses
against this threat model have been proposed, but each suffers from at least
one of the following flaws: they are easily overcome by adaptive attacks, they
severely reduce testing performance, or they cannot generalize to diverse data
poisoning threat models. Adversarial training, and its variants, is currently
considered the only empirically strong defense against (inference-time)
adversarial attacks. In this work, we extend the adversarial training framework
to instead defend against (training-time) poisoning and backdoor attacks. Our
method desensitizes networks to the effects of poisoning by creating poisons
during training and injecting them into training batches. We show that this
defense withstands adaptive attacks, generalizes to diverse threat models, and
incurs a better performance trade-off than previous defenses.
Related papers
- Wicked Oddities: Selectively Poisoning for Effective Clean-Label Backdoor Attacks [11.390175856652856]
Clean-label attacks are a more stealthy form of backdoor attacks that can perform the attack without changing the labels of poisoned data.
We study different strategies for selectively poisoning a small set of training samples in the target class to boost the attack success rate.
Our threat model poses a serious threat in training machine learning models with third-party datasets.
arXiv Detail & Related papers (2024-07-15T15:38:21Z) - SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - Defending against Insertion-based Textual Backdoor Attacks via
Attribution [18.935041122443675]
We propose AttDef, an efficient attribution-based pipeline to defend against two insertion-based poisoning attacks.
Specifically, we regard the tokens with larger attribution scores as potential triggers since larger attribution words contribute more to the false prediction results.
We show that our proposed method can generalize sufficiently well in two common attack scenarios.
arXiv Detail & Related papers (2023-05-03T19:29:26Z) - On the Effectiveness of Adversarial Training against Backdoor Attacks [111.8963365326168]
A backdoored model always predicts a target class in the presence of a predefined trigger pattern.
In general, adversarial training is believed to defend against backdoor attacks.
We propose a hybrid strategy which provides satisfactory robustness across different backdoor attacks.
arXiv Detail & Related papers (2022-02-22T02:24:46Z) - Defening against Adversarial Denial-of-Service Attacks [0.0]
Data poisoning is one of the most relevant security threats against machine learning and data-driven technologies.
We propose a new approach of detecting DoS poisoned instances.
We evaluate our defence against two DoS poisoning attacks and seven datasets, and find that it reliably identifies poisoned instances.
arXiv Detail & Related papers (2021-04-14T09:52:36Z) - Provable Defense Against Delusive Poisoning [64.69220849669948]
We show that adversarial training can be a principled defense method against delusive poisoning.
This implies that adversarial training can be a principled defense method against delusive poisoning.
arXiv Detail & Related papers (2021-02-09T09:19:47Z) - Strong Data Augmentation Sanitizes Poisoning and Backdoor Attacks
Without an Accuracy Tradeoff [57.35978884015093]
We show that strong data augmentations, such as CutMix, can significantly diminish the threat of poisoning and backdoor attacks without trading off performance.
In the context of backdoors, CutMix greatly mitigates the attack while simultaneously increasing validation accuracy by 9%.
arXiv Detail & Related papers (2020-11-18T20:18:50Z) - Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching [56.280018325419896]
Data Poisoning attacks modify training data to maliciously control a model trained on such data.
We analyze a particularly malicious poisoning attack that is both "from scratch" and "clean label"
We show that it is the first poisoning method to cause targeted misclassification in modern deep networks trained from scratch on a full-sized, poisoned ImageNet dataset.
arXiv Detail & Related papers (2020-09-04T16:17:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.