Few-shot Backdoor Attacks via Neural Tangent Kernels
- URL: http://arxiv.org/abs/2210.05929v1
- Date: Wed, 12 Oct 2022 05:30:00 GMT
- Title: Few-shot Backdoor Attacks via Neural Tangent Kernels
- Authors: Jonathan Hayase, Sewoong Oh
- Abstract summary: In a backdoor attack, an attacker injects corrupted examples into the training set.
Central to these attacks is the trade-off between the success rate of the attack and the number of corrupted training examples injected.
We use neural tangent kernels to approximate the training dynamics of the model being attacked and automatically learn strong poison examples.
- Score: 31.85706783674533
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In a backdoor attack, an attacker injects corrupted examples into the
training set. The goal of the attacker is to cause the final trained model to
predict the attacker's desired target label when a predefined trigger is added
to test inputs. Central to these attacks is the trade-off between the success
rate of the attack and the number of corrupted training examples injected. We
pose this attack as a novel bilevel optimization problem: construct strong
poison examples that maximize the attack success rate of the trained model. We
use neural tangent kernels to approximate the training dynamics of the model
being attacked and automatically learn strong poison examples. We experiment on
subclasses of CIFAR-10 and ImageNet with WideResNet-34 and ConvNeXt
architectures on periodic and patch trigger attacks and show that NTBA-designed
poisoned examples achieve, for example, an attack success rate of 90% with ten
times smaller number of poison examples injected compared to the baseline. We
provided an interpretation of the NTBA-designed attacks using the analysis of
kernel linear regression. We further demonstrate a vulnerability in
overparametrized deep neural networks, which is revealed by the shape of the
neural tangent kernel.
Related papers
- Wicked Oddities: Selectively Poisoning for Effective Clean-Label Backdoor Attacks [11.390175856652856]
Clean-label attacks are a more stealthy form of backdoor attacks that can perform the attack without changing the labels of poisoned data.
We study different strategies for selectively poisoning a small set of training samples in the target class to boost the attack success rate.
Our threat model poses a serious threat in training machine learning models with third-party datasets.
arXiv Detail & Related papers (2024-07-15T15:38:21Z) - SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - An Invisible Backdoor Attack Based On Semantic Feature [0.0]
Backdoor attacks have severely threatened deep neural network (DNN) models in the past several years.
We propose a novel backdoor attack, making imperceptible changes.
We evaluate our attack on three prominent image classification datasets.
arXiv Detail & Related papers (2024-05-19T13:50:40Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Narcissus: A Practical Clean-Label Backdoor Attack with Limited
Information [22.98039177091884]
"Clean-label" backdoor attacks require knowledge of the entire training set to be effective.
This paper provides an algorithm to mount clean-label backdoor attacks based only on the knowledge of representative examples from the target class.
Our attack works well across datasets and models, even when the trigger presents in the physical world.
arXiv Detail & Related papers (2022-04-11T16:58:04Z) - AntidoteRT: Run-time Detection and Correction of Poison Attacks on
Neural Networks [18.461079157949698]
backdoor poisoning attacks against image classification networks.
We propose lightweight automated detection and correction techniques against poisoning attacks.
Our technique outperforms existing defenses such as NeuralCleanse and STRIP on popular benchmarks.
arXiv Detail & Related papers (2022-01-31T23:42:32Z) - Few-shot Backdoor Defense Using Shapley Estimation [123.56934991060788]
We develop a new approach called Shapley Pruning to mitigate backdoor attacks on deep neural networks.
ShapPruning identifies the few infected neurons (under 1% of all neurons) and manages to protect the model's structure and accuracy.
Experiments demonstrate the effectiveness and robustness of our method against various attacks and tasks.
arXiv Detail & Related papers (2021-12-30T02:27:03Z) - FooBaR: Fault Fooling Backdoor Attack on Neural Network Training [5.639451539396458]
We explore a novel attack paradigm by injecting faults during the training phase of a neural network in a way that the resulting network can be attacked during deployment without the necessity of further faulting.
We call such attacks fooling backdoors as the fault attacks at the training phase inject backdoors into the network that allow an attacker to produce fooling inputs.
arXiv Detail & Related papers (2021-09-23T09:43:19Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z) - Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching [56.280018325419896]
Data Poisoning attacks modify training data to maliciously control a model trained on such data.
We analyze a particularly malicious poisoning attack that is both "from scratch" and "clean label"
We show that it is the first poisoning method to cause targeted misclassification in modern deep networks trained from scratch on a full-sized, poisoned ImageNet dataset.
arXiv Detail & Related papers (2020-09-04T16:17:54Z) - Adversarial Imitation Attack [63.76805962712481]
A practical adversarial attack should require as little as possible knowledge of attacked models.
Current substitute attacks need pre-trained models to generate adversarial examples.
In this study, we propose a novel adversarial imitation attack.
arXiv Detail & Related papers (2020-03-28T10:02:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.