MDTD: A Multi Domain Trojan Detector for Deep Neural Networks
- URL: http://arxiv.org/abs/2308.15673v2
- Date: Sun, 3 Sep 2023 01:59:49 GMT
- Title: MDTD: A Multi Domain Trojan Detector for Deep Neural Networks
- Authors: Arezoo Rajabi, Surudhi Asokraj, Fengqing Jiang, Luyao Niu, Bhaskar
Ramasubramanian, Jim Ritcey, Radha Poovendran
- Abstract summary: Machine learning models that use deep neural networks (DNNs) are vulnerable to backdoor attacks.
We propose MDTD, a Multi-Domain Trojan Detector for DNNs, which detects inputs containing a Trojan trigger at testing time.
We evaluate MDTD against adaptive attacks where an adversary trains a robust DNN to increase (decrease) distance of benign (Trojan) inputs from a decision boundary.
- Score: 2.4651521935081364
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning models that use deep neural networks (DNNs) are vulnerable
to backdoor attacks. An adversary carrying out a backdoor attack embeds a
predefined perturbation called a trigger into a small subset of input samples
and trains the DNN such that the presence of the trigger in the input results
in an adversary-desired output class. Such adversarial retraining however needs
to ensure that outputs for inputs without the trigger remain unaffected and
provide high classification accuracy on clean samples. In this paper, we
propose MDTD, a Multi-Domain Trojan Detector for DNNs, which detects inputs
containing a Trojan trigger at testing time. MDTD does not require knowledge of
trigger-embedding strategy of the attacker and can be applied to a pre-trained
DNN model with image, audio, or graph-based inputs. MDTD leverages an insight
that input samples containing a Trojan trigger are located relatively farther
away from a decision boundary than clean samples. MDTD estimates the distance
to a decision boundary using adversarial learning methods and uses this
distance to infer whether a test-time input sample is Trojaned or not. We
evaluate MDTD against state-of-the-art Trojan detection methods across five
widely used image-based datasets: CIFAR100, CIFAR10, GTSRB, SVHN, and
Flowers102; four graph-based datasets: AIDS, WinMal, Toxicant, and COLLAB; and
the SpeechCommand audio dataset. MDTD effectively identifies samples that
contain different types of Trojan triggers. We evaluate MDTD against adaptive
attacks where an adversary trains a robust DNN to increase (decrease) distance
of benign (Trojan) inputs from a decision boundary.
Related papers
- FreeEagle: Detecting Complex Neural Trojans in Data-Free Cases [50.065022493142116]
Trojan attack on deep neural networks, also known as backdoor attack, is a typical threat to artificial intelligence.
FreeEagle is the first data-free backdoor detection method that can effectively detect complex backdoor attacks.
arXiv Detail & Related papers (2023-02-28T11:31:29Z) - PerD: Perturbation Sensitivity-based Neural Trojan Detection Framework
on NLP Applications [21.854581570954075]
Trojan attacks embed the backdoor into the victim and is activated by the trigger in the input space.
We propose a model-level Trojan detection framework by analyzing the deviation of the model output when we introduce a specially crafted perturbation to the input.
We demonstrate the effectiveness of our proposed method on both a dataset of NLP models we create and a public dataset of Trojaned NLP models from TrojAI.
arXiv Detail & Related papers (2022-08-08T22:50:03Z) - Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free [126.15842954405929]
Trojan attacks threaten deep neural networks (DNNs) by poisoning them to behave normally on most samples, yet to produce manipulated results for inputs attached with a trigger.
We propose a novel Trojan network detection regime: first locating a "winning Trojan lottery ticket" which preserves nearly full Trojan information yet only chance-level performance on clean inputs; then recovering the trigger embedded in this already isolated subnetwork.
arXiv Detail & Related papers (2022-05-24T06:33:31Z) - An Adaptive Black-box Backdoor Detection Method for Deep Neural Networks [25.593824693347113]
Deep Neural Networks (DNNs) have demonstrated unprecedented performance across various fields such as medical diagnosis and autonomous driving.
They are identified to be vulnerable to Neural Trojan (NT) attacks that are controlled and activated by stealthy triggers.
We propose a robust and adaptive Trojan detection scheme that inspects whether a pre-trained model has been Trojaned before its deployment.
arXiv Detail & Related papers (2022-04-08T23:41:19Z) - TAD: Trigger Approximation based Black-box Trojan Detection for AI [16.741385045881113]
Deep Neural Networks (DNNs) have demonstrated unprecedented performance across various fields such as medical diagnosis and autonomous driving.
They are identified to be vulnerable to Trojan (NT) attacks that are controlled and activated by the trigger.
We propose a robust Trojan detection scheme that inspects whether a pre-trained AI model has been Trojaned before its deployment.
arXiv Detail & Related papers (2021-02-03T00:49:50Z) - Detecting Trojaned DNNs Using Counterfactual Attributions [15.988574580713328]
Such models behave normally with typical inputs but produce specific incorrect predictions for inputs with a Trojan trigger.
Our approach is based on a novel observation that the trigger behavior depends on a few ghost neurons that activate on trigger pattern.
We use this information for Trojan detection by using a deep set encoder.
arXiv Detail & Related papers (2020-12-03T21:21:33Z) - Practical Detection of Trojan Neural Networks: Data-Limited and
Data-Free Cases [87.69818690239627]
We study the problem of the Trojan network (TrojanNet) detection in the data-scarce regime.
We propose a data-limited TrojanNet detector (TND), when only a few data samples are available for TrojanNet detection.
In addition, we propose a data-free TND, which can detect a TrojanNet without accessing any data samples.
arXiv Detail & Related papers (2020-07-31T02:00:38Z) - Cassandra: Detecting Trojaned Networks from Adversarial Perturbations [92.43879594465422]
In many cases, pre-trained models are sourced from vendors who may have disrupted the training pipeline to insert Trojan behaviors into the models.
We propose a method to verify if a pre-trained model is Trojaned or benign.
Our method captures fingerprints of neural networks in the form of adversarial perturbations learned from the network gradients.
arXiv Detail & Related papers (2020-07-28T19:00:40Z) - Odyssey: Creation, Analysis and Detection of Trojan Models [91.13959405645959]
Trojan attacks interfere with the training pipeline by inserting triggers into some of the training samples and trains the model to act maliciously only for samples that contain the trigger.
Existing Trojan detectors make strong assumptions about the types of triggers and attacks.
We propose a detector that is based on the analysis of the intrinsic properties; that are affected due to the Trojaning process.
arXiv Detail & Related papers (2020-07-16T06:55:00Z) - An Embarrassingly Simple Approach for Trojan Attack in Deep Neural
Networks [59.42357806777537]
trojan attack aims to attack deployed deep neural networks (DNNs) relying on hidden trigger patterns inserted by hackers.
We propose a training-free attack approach which is different from previous work, in which trojaned behaviors are injected by retraining model on a poisoned dataset.
The proposed TrojanNet has several nice properties including (1) it activates by tiny trigger patterns and keeps silent for other signals, (2) it is model-agnostic and could be injected into most DNNs, dramatically expanding its attack scenarios, and (3) the training-free mechanism saves massive training efforts compared to conventional trojan attack methods.
arXiv Detail & Related papers (2020-06-15T04:58:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.