VPN: Verification of Poisoning in Neural Networks
- URL: http://arxiv.org/abs/2205.03894v1
- Date: Sun, 8 May 2022 15:16:05 GMT
- Title: VPN: Verification of Poisoning in Neural Networks
- Authors: Youcheng Sun and Muhammad Usman and Divya Gopinath and Corina S.
P\u{a}s\u{a}reanu
- Abstract summary: We study another neural network security issue, namely data poisoning.
In this case an attacker inserts a trigger into a subset of the training data, in such a way that at test time, this trigger in an input causes the trained model to misclassify to some target class.
We show how to formulate the check for data poisoning as a property that can be checked with off-the-shelf verification tools.
- Score: 11.221552724154988
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural networks are successfully used in a variety of applications, many of
them having safety and security concerns. As a result researchers have proposed
formal verification techniques for verifying neural network properties. While
previous efforts have mainly focused on checking local robustness in neural
networks, we instead study another neural network security issue, namely data
poisoning. In this case an attacker inserts a trigger into a subset of the
training data, in such a way that at test time, this trigger in an input causes
the trained model to misclassify to some target class. We show how to formulate
the check for data poisoning as a property that can be checked with
off-the-shelf verification tools, such as Marabou and nneum, where
counterexamples of failed checks constitute the triggers. We further show that
the discovered triggers are `transferable' from a small model to a larger,
better-trained model, allowing us to analyze state-of-the art performant models
trained for image classification tasks.
Related papers
- Benign Overfitting for Two-layer ReLU Convolutional Neural Networks [60.19739010031304]
We establish algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise.
We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
arXiv Detail & Related papers (2023-03-07T18:59:38Z) - Invisible Backdoor Attacks Using Data Poisoning in the Frequency Domain [8.64369418938889]
We propose a generalized backdoor attack method based on the frequency domain.
It can implement backdoor implantation without mislabeling and accessing the training process.
We evaluate our approach in the no-label and clean-label cases on three datasets.
arXiv Detail & Related papers (2022-07-09T07:05:53Z) - AntidoteRT: Run-time Detection and Correction of Poison Attacks on
Neural Networks [18.461079157949698]
backdoor poisoning attacks against image classification networks.
We propose lightweight automated detection and correction techniques against poisoning attacks.
Our technique outperforms existing defenses such as NeuralCleanse and STRIP on popular benchmarks.
arXiv Detail & Related papers (2022-01-31T23:42:32Z) - Sleeper Agent: Scalable Hidden Trigger Backdoors for Neural Networks
Trained from Scratch [99.90716010490625]
Backdoor attackers tamper with training data to embed a vulnerability in models that are trained on that data.
This vulnerability is then activated at inference time by placing a "trigger" into the model's input.
We develop a new hidden trigger attack, Sleeper Agent, which employs gradient matching, data selection, and target model re-training during the crafting process.
arXiv Detail & Related papers (2021-06-16T17:09:55Z) - TOP: Backdoor Detection in Neural Networks via Transferability of
Perturbation [1.52292571922932]
Detection of backdoors in trained models without access to the training data or example triggers is an important open problem.
In this paper, we identify an interesting property of these models: adversarial perturbations transfer from image to image more readily in poisoned models than in clean models.
We use this feature to detect poisoned models in the TrojAI benchmark, as well as additional models.
arXiv Detail & Related papers (2021-03-18T14:13:30Z) - Explainable Adversarial Attacks in Deep Neural Networks Using Activation
Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples.
We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z) - BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by
Adversarial Attacks [65.2021953284622]
We study robustness of CNNs against white-box and black-box adversarial attacks.
Results are shown for distilled CNNs, agent-based state-of-the-art pruned models, and binarized neural networks.
arXiv Detail & Related papers (2021-03-14T20:43:19Z) - Detecting Backdoors in Neural Networks Using Novel Feature-Based Anomaly
Detection [16.010654200489913]
This paper proposes a new defense against neural network backdooring attacks.
It is based on the intuition that the feature extraction layers of a backdoored network embed new features to detect the presence of a trigger.
To detect backdoors, the proposed defense uses two synergistic anomaly detectors trained on clean validation data.
arXiv Detail & Related papers (2020-11-04T20:33:51Z) - Cassandra: Detecting Trojaned Networks from Adversarial Perturbations [92.43879594465422]
In many cases, pre-trained models are sourced from vendors who may have disrupted the training pipeline to insert Trojan behaviors into the models.
We propose a method to verify if a pre-trained model is Trojaned or benign.
Our method captures fingerprints of neural networks in the form of adversarial perturbations learned from the network gradients.
arXiv Detail & Related papers (2020-07-28T19:00:40Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.