Cassandra: Detecting Trojaned Networks from Adversarial Perturbations
- URL: http://arxiv.org/abs/2007.14433v1
- Date: Tue, 28 Jul 2020 19:00:40 GMT
- Title: Cassandra: Detecting Trojaned Networks from Adversarial Perturbations
- Authors: Xiaoyu Zhang, Ajmal Mian, Rohit Gupta, Nazanin Rahnavard and Mubarak
Shah
- Abstract summary: In many cases, pre-trained models are sourced from vendors who may have disrupted the training pipeline to insert Trojan behaviors into the models.
We propose a method to verify if a pre-trained model is Trojaned or benign.
Our method captures fingerprints of neural networks in the form of adversarial perturbations learned from the network gradients.
- Score: 92.43879594465422
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks are being widely deployed for many critical tasks due to
their high classification accuracy. In many cases, pre-trained models are
sourced from vendors who may have disrupted the training pipeline to insert
Trojan behaviors into the models. These malicious behaviors can be triggered at
the adversary's will and hence, cause a serious threat to the widespread
deployment of deep models. We propose a method to verify if a pre-trained model
is Trojaned or benign. Our method captures fingerprints of neural networks in
the form of adversarial perturbations learned from the network gradients.
Inserting backdoors into a network alters its decision boundaries which are
effectively encoded in their adversarial perturbations. We train a two stream
network for Trojan detection from its global ($L_\infty$ and $L_2$ bounded)
perturbations and the localized region of high energy within each perturbation.
The former encodes decision boundaries of the network and latter encodes the
unknown trigger shape. We also propose an anomaly detection method to identify
the target class in a Trojaned network. Our methods are invariant to the
trigger type, trigger size, training data and network architecture. We evaluate
our methods on MNIST, NIST-Round0 and NIST-Round1 datasets, with up to 1,000
pre-trained models making this the largest study to date on Trojaned network
detection, and achieve over 92\% detection accuracy to set the new
state-of-the-art.
Related papers
- TEN-GUARD: Tensor Decomposition for Backdoor Attack Detection in Deep
Neural Networks [3.489779105594534]
We introduce a novel approach to backdoor detection using two tensor decomposition methods applied to network activations.
This has a number of advantages relative to existing detection methods, including the ability to analyze multiple models at the same time.
Results show that our method detects backdoored networks more accurately and efficiently than current state-of-the-art methods.
arXiv Detail & Related papers (2024-01-06T03:08:28Z) - FreeEagle: Detecting Complex Neural Trojans in Data-Free Cases [50.065022493142116]
Trojan attack on deep neural networks, also known as backdoor attack, is a typical threat to artificial intelligence.
FreeEagle is the first data-free backdoor detection method that can effectively detect complex backdoor attacks.
arXiv Detail & Related papers (2023-02-28T11:31:29Z) - Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free [126.15842954405929]
Trojan attacks threaten deep neural networks (DNNs) by poisoning them to behave normally on most samples, yet to produce manipulated results for inputs attached with a trigger.
We propose a novel Trojan network detection regime: first locating a "winning Trojan lottery ticket" which preserves nearly full Trojan information yet only chance-level performance on clean inputs; then recovering the trigger embedded in this already isolated subnetwork.
arXiv Detail & Related papers (2022-05-24T06:33:31Z) - An Adaptive Black-box Backdoor Detection Method for Deep Neural Networks [25.593824693347113]
Deep Neural Networks (DNNs) have demonstrated unprecedented performance across various fields such as medical diagnosis and autonomous driving.
They are identified to be vulnerable to Neural Trojan (NT) attacks that are controlled and activated by stealthy triggers.
We propose a robust and adaptive Trojan detection scheme that inspects whether a pre-trained model has been Trojaned before its deployment.
arXiv Detail & Related papers (2022-04-08T23:41:19Z) - Trojan Signatures in DNN Weights [20.93172486021463]
We present the first ultra light-weight and highly effective trojan detection method that does not require access to the training/test data.
Our approach focuses on analysis of the weights of the final, linear layer of the network.
We show that the distribution of the weights associated with the trojan target class is clearly distinguishable from the weights associated with other classes.
arXiv Detail & Related papers (2021-09-07T03:07:03Z) - Topological Detection of Trojaned Neural Networks [10.559903139528252]
Trojan attacks occur when attackers stealthily manipulate the model's behavior.
We find subtle structural deviation characterizing Trojaned models.
We devise a strategy for robust detection of Trojaned models.
arXiv Detail & Related papers (2021-06-11T15:48:16Z) - Practical Detection of Trojan Neural Networks: Data-Limited and
Data-Free Cases [87.69818690239627]
We study the problem of the Trojan network (TrojanNet) detection in the data-scarce regime.
We propose a data-limited TrojanNet detector (TND), when only a few data samples are available for TrojanNet detection.
In addition, we propose a data-free TND, which can detect a TrojanNet without accessing any data samples.
arXiv Detail & Related papers (2020-07-31T02:00:38Z) - Odyssey: Creation, Analysis and Detection of Trojan Models [91.13959405645959]
Trojan attacks interfere with the training pipeline by inserting triggers into some of the training samples and trains the model to act maliciously only for samples that contain the trigger.
Existing Trojan detectors make strong assumptions about the types of triggers and attacks.
We propose a detector that is based on the analysis of the intrinsic properties; that are affected due to the Trojaning process.
arXiv Detail & Related papers (2020-07-16T06:55:00Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.