Traceback of Data Poisoning Attacks in Neural Networks
- URL: http://arxiv.org/abs/2110.06904v1
- Date: Wed, 13 Oct 2021 17:39:18 GMT
- Title: Traceback of Data Poisoning Attacks in Neural Networks
- Authors: Shawn Shan, Arjun Nitin Bhagoji, Haitao Zheng, Ben Y. Zhao
- Abstract summary: We describe our efforts in developing a forensic traceback tool for poison attacks on deep neural networks.
We propose a novel iterative clustering and pruning solution that trims "innocent" training samples.
We empirically demonstrate the efficacy of our system on three types of dirty-label (backdoor) poison attacks and three types of clean-label poison attacks.
- Score: 24.571668412312196
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In adversarial machine learning, new defenses against attacks on deep
learning systems are routinely broken soon after their release by more powerful
attacks. In this context, forensic tools can offer a valuable complement to
existing defenses, by tracing back a successful attack to its root cause, and
offering a path forward for mitigation to prevent similar attacks in the
future.
In this paper, we describe our efforts in developing a forensic traceback
tool for poison attacks on deep neural networks. We propose a novel iterative
clustering and pruning solution that trims "innocent" training samples, until
all that remains is the set of poisoned data responsible for the attack. Our
method clusters training samples based on their impact on model parameters,
then uses an efficient data unlearning method to prune innocent clusters. We
empirically demonstrate the efficacy of our system on three types of
dirty-label (backdoor) poison attacks and three types of clean-label poison
attacks, across domains of computer vision and malware classification. Our
system achieves over 98.4% precision and 96.8% recall across all attacks. We
also show that our system is robust against four anti-forensics measures
specifically designed to attack it.
Related papers
- Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning.
This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities.
In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z) - Wicked Oddities: Selectively Poisoning for Effective Clean-Label Backdoor Attacks [11.390175856652856]
Clean-label attacks are a more stealthy form of backdoor attacks that can perform the attack without changing the labels of poisoned data.
We study different strategies for selectively poisoning a small set of training samples in the target class to boost the attack success rate.
Our threat model poses a serious threat in training machine learning models with third-party datasets.
arXiv Detail & Related papers (2024-07-15T15:38:21Z) - SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - Diffusion Denoising as a Certified Defense against Clean-label Poisoning [56.04951180983087]
We show how an off-the-shelf diffusion model can sanitize the tampered training data.
We extensively test our defense against seven clean-label poisoning attacks and reduce their attack success to 0-16% with only a negligible drop in the test time accuracy.
arXiv Detail & Related papers (2024-03-18T17:17:07Z) - Few-shot Backdoor Attacks via Neural Tangent Kernels [31.85706783674533]
In a backdoor attack, an attacker injects corrupted examples into the training set.
Central to these attacks is the trade-off between the success rate of the attack and the number of corrupted training examples injected.
We use neural tangent kernels to approximate the training dynamics of the model being attacked and automatically learn strong poison examples.
arXiv Detail & Related papers (2022-10-12T05:30:00Z) - Indiscriminate Data Poisoning Attacks on Neural Networks [28.09519873656809]
Data poisoning attacks aim to influence a model by injecting "poisoned" data into the training process.
We take a closer look at existing poisoning attacks and connect them with old and new algorithms for solving sequential Stackelberg games.
We present efficient implementations that exploit modern auto-differentiation packages and allow simultaneous and coordinated generation of poisoned points.
arXiv Detail & Related papers (2022-04-19T18:57:26Z) - Defening against Adversarial Denial-of-Service Attacks [0.0]
Data poisoning is one of the most relevant security threats against machine learning and data-driven technologies.
We propose a new approach of detecting DoS poisoned instances.
We evaluate our defence against two DoS poisoning attacks and seven datasets, and find that it reliably identifies poisoned instances.
arXiv Detail & Related papers (2021-04-14T09:52:36Z) - What Doesn't Kill You Makes You Robust(er): Adversarial Training against
Poisons and Backdoors [57.040948169155925]
We extend the adversarial training framework to defend against (training-time) poisoning and backdoor attacks.
Our method desensitizes networks to the effects of poisoning by creating poisons during training and injecting them into training batches.
We show that this defense withstands adaptive attacks, generalizes to diverse threat models, and incurs a better performance trade-off than previous defenses.
arXiv Detail & Related papers (2021-02-26T17:54:36Z) - Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching [56.280018325419896]
Data Poisoning attacks modify training data to maliciously control a model trained on such data.
We analyze a particularly malicious poisoning attack that is both "from scratch" and "clean label"
We show that it is the first poisoning method to cause targeted misclassification in modern deep networks trained from scratch on a full-sized, poisoned ImageNet dataset.
arXiv Detail & Related papers (2020-09-04T16:17:54Z) - A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems.
This paper proposes a self-supervised adversarial training mechanism in the input space.
It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.