Related papers: Few-shot Backdoor Attacks via Neural Tangent Kernels

Few-shot Backdoor Attacks via Neural Tangent Kernels

URL: http://arxiv.org/abs/2210.05929v1
Date: Wed, 12 Oct 2022 05:30:00 GMT
Title: Few-shot Backdoor Attacks via Neural Tangent Kernels
Authors: Jonathan Hayase, Sewoong Oh
Abstract summary: In a backdoor attack, an attacker injects corrupted examples into the training set. Central to these attacks is the trade-off between the success rate of the attack and the number of corrupted training examples injected. We use neural tangent kernels to approximate the training dynamics of the model being attacked and automatically learn strong poison examples.
Score: 31.85706783674533
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In a backdoor attack, an attacker injects corrupted examples into the training set. The goal of the attacker is to cause the final trained model to predict the attacker's desired target label when a predefined trigger is added to test inputs. Central to these attacks is the trade-off between the success rate of the attack and the number of corrupted training examples injected. We pose this attack as a novel bilevel optimization problem: construct strong poison examples that maximize the attack success rate of the trained model. We use neural tangent kernels to approximate the training dynamics of the model being attacked and automatically learn strong poison examples. We experiment on subclasses of CIFAR-10 and ImageNet with WideResNet-34 and ConvNeXt architectures on periodic and patch trigger attacks and show that NTBA-designed poisoned examples achieve, for example, an attack success rate of 90% with ten times smaller number of poison examples injected compared to the baseline. We provided an interpretation of the NTBA-designed attacks using the analysis of kernel linear regression. We further demonstrate a vulnerability in overparametrized deep neural networks, which is revealed by the shape of the neural tangent kernel.

Related papers

Wicked Oddities: Selectively Poisoning for Effective Clean-Label Backdoor Attacks [11.390175856652856]
Clean-label attacks are a more stealthy form of backdoor attacks that can perform the attack without changing the labels of poisoned data. We study different strategies for selectively poisoning a small set of training samples in the target class to boost the attack success rate. Our threat model poses a serious threat in training machine learning models with third-party datasets.
arXiv Detail & Related papers (2024-07-15T15:38:21Z)
SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources. Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker. Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z)
An Invisible Backdoor Attack Based On Semantic Feature [0.0]
Backdoor attacks have severely threatened deep neural network (DNN) models in the past several years. We propose a novel backdoor attack, making imperceptible changes. We evaluate our attack on three prominent image classification datasets.
arXiv Detail & Related papers (2024-05-19T13:50:40Z)
Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks. backdoor attack is an emerging yet threatening training-phase threat. We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z)
Narcissus: A Practical Clean-Label Backdoor Attack with Limited Information [22.98039177091884]
"Clean-label" backdoor attacks require knowledge of the entire training set to be effective. This paper provides an algorithm to mount clean-label backdoor attacks based only on the knowledge of representative examples from the target class. Our attack works well across datasets and models, even when the trigger presents in the physical world.
arXiv Detail & Related papers (2022-04-11T16:58:04Z)
AntidoteRT: Run-time Detection and Correction of Poison Attacks on Neural Networks [18.461079157949698]
backdoor poisoning attacks against image classification networks. We propose lightweight automated detection and correction techniques against poisoning attacks. Our technique outperforms existing defenses such as NeuralCleanse and STRIP on popular benchmarks.
arXiv Detail & Related papers (2022-01-31T23:42:32Z)
Few-shot Backdoor Defense Using Shapley Estimation [123.56934991060788]
We develop a new approach called Shapley Pruning to mitigate backdoor attacks on deep neural networks. ShapPruning identifies the few infected neurons (under 1% of all neurons) and manages to protect the model's structure and accuracy. Experiments demonstrate the effectiveness and robustness of our method against various attacks and tasks.
arXiv Detail & Related papers (2021-12-30T02:27:03Z)
FooBaR: Fault Fooling Backdoor Attack on Neural Network Training [5.639451539396458]
We explore a novel attack paradigm by injecting faults during the training phase of a neural network in a way that the resulting network can be attacked during deployment without the necessity of further faulting. We call such attacks fooling backdoors as the fault attacks at the training phase inject backdoors into the network that allow an attacker to produce fooling inputs.
arXiv Detail & Related papers (2021-09-23T09:43:19Z)
Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data. We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level. Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z)
Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching [56.280018325419896]
Data Poisoning attacks modify training data to maliciously control a model trained on such data. We analyze a particularly malicious poisoning attack that is both "from scratch" and "clean label" We show that it is the first poisoning method to cause targeted misclassification in modern deep networks trained from scratch on a full-sized, poisoned ImageNet dataset.
arXiv Detail & Related papers (2020-09-04T16:17:54Z)
Adversarial Imitation Attack [63.76805962712481]
A practical adversarial attack should require as little as possible knowledge of attacked models. Current substitute attacks need pre-trained models to generate adversarial examples. In this study, we propose a novel adversarial imitation attack.
arXiv Detail & Related papers (2020-03-28T10:02:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.