Breaking the De-Pois Poisoning Defense
- URL: http://arxiv.org/abs/2204.01090v1
- Date: Sun, 3 Apr 2022 15:17:47 GMT
- Title: Breaking the De-Pois Poisoning Defense
- Authors: Alaa Anani, Mohamed Ghanem and Lotfy Abdel Khaliq
- Abstract summary: We show that the attack-agnostic De-Pois defense is hardly an exception to that rule.
In our work, we break this poison-protection layer by replicating the critic model and then performing a composed gradient-sign attack on both the critic and target models simultaneously.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Attacks on machine learning models have been, since their conception, a very
persistent and evasive issue resembling an endless cat-and-mouse game. One
major variant of such attacks is poisoning attacks which can indirectly
manipulate an ML model. It has been observed over the years that the majority
of proposed effective defense models are only effective when an attacker is not
aware of them being employed. In this paper, we show that the attack-agnostic
De-Pois defense is hardly an exception to that rule. In fact, we demonstrate
its vulnerability to the simplest White-Box and Black-Box attacks by an
attacker that knows the structure of the De-Pois defense model. In essence, the
De-Pois defense relies on a critic model that can be used to detect poisoned
data before passing it to the target model. In our work, we break this
poison-protection layer by replicating the critic model and then performing a
composed gradient-sign attack on both the critic and target models
simultaneously -- allowing us to bypass the critic firewall to poison the
target model.
Related papers
- Diffusion Denoising as a Certified Defense against Clean-label Poisoning [56.04951180983087]
We show how an off-the-shelf diffusion model can sanitize the tampered training data.
We extensively test our defense against seven clean-label poisoning attacks and reduce their attack success to 0-16% with only a negligible drop in the test time accuracy.
arXiv Detail & Related papers (2024-03-18T17:17:07Z) - Pick your Poison: Undetectability versus Robustness in Data Poisoning
Attacks [33.82164201455115]
Deep image classification models trained on vast amounts of web-scraped data are susceptible to data poisoning.
Existing work considers an effective defense as one that either (i) restores a model's integrity through repair or (ii) detects an attack.
We argue that this approach overlooks a crucial trade-off: Attackers can increase at the expense of detectability (over-poisoning) or decrease detectability at the cost of robustness (under-poisoning)
arXiv Detail & Related papers (2023-05-07T15:58:06Z) - De-Pois: An Attack-Agnostic Defense against Data Poisoning Attacks [17.646155241759743]
De-Pois is an attack-agnostic defense against poisoning attacks.
We implement four types of poisoning attacks and evaluate De-Pois with five typical defense methods.
arXiv Detail & Related papers (2021-05-08T04:47:37Z) - What Doesn't Kill You Makes You Robust(er): Adversarial Training against
Poisons and Backdoors [57.040948169155925]
We extend the adversarial training framework to defend against (training-time) poisoning and backdoor attacks.
Our method desensitizes networks to the effects of poisoning by creating poisons during training and injecting them into training batches.
We show that this defense withstands adaptive attacks, generalizes to diverse threat models, and incurs a better performance trade-off than previous defenses.
arXiv Detail & Related papers (2021-02-26T17:54:36Z) - Concealed Data Poisoning Attacks on NLP Models [56.794857982509455]
Adversarial attacks alter NLP model predictions by perturbing test-time inputs.
We develop a new data poisoning attack that allows an adversary to control model predictions whenever a desired trigger phrase is present in the input.
arXiv Detail & Related papers (2020-10-23T17:47:06Z) - Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching [56.280018325419896]
Data Poisoning attacks modify training data to maliciously control a model trained on such data.
We analyze a particularly malicious poisoning attack that is both "from scratch" and "clean label"
We show that it is the first poisoning method to cause targeted misclassification in modern deep networks trained from scratch on a full-sized, poisoned ImageNet dataset.
arXiv Detail & Related papers (2020-09-04T16:17:54Z) - Label-Only Membership Inference Attacks [67.46072950620247]
We introduce label-only membership inference attacks.
Our attacks evaluate the robustness of a model's predicted labels under perturbations.
We find that training models with differential privacy and (strong) L2 regularization are the only known defense strategies.
arXiv Detail & Related papers (2020-07-28T15:44:31Z) - Adversarial Imitation Attack [63.76805962712481]
A practical adversarial attack should require as little as possible knowledge of attacked models.
Current substitute attacks need pre-trained models to generate adversarial examples.
In this study, we propose a novel adversarial imitation attack.
arXiv Detail & Related papers (2020-03-28T10:02:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.