Practical Detection of Trojan Neural Networks: Data-Limited and
Data-Free Cases
- URL: http://arxiv.org/abs/2007.15802v1
- Date: Fri, 31 Jul 2020 02:00:38 GMT
- Title: Practical Detection of Trojan Neural Networks: Data-Limited and
Data-Free Cases
- Authors: Ren Wang, Gaoyuan Zhang, Sijia Liu, Pin-Yu Chen, Jinjun Xiong, Meng
Wang
- Abstract summary: We study the problem of the Trojan network (TrojanNet) detection in the data-scarce regime.
We propose a data-limited TrojanNet detector (TND), when only a few data samples are available for TrojanNet detection.
In addition, we propose a data-free TND, which can detect a TrojanNet without accessing any data samples.
- Score: 87.69818690239627
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When the training data are maliciously tampered, the predictions of the
acquired deep neural network (DNN) can be manipulated by an adversary known as
the Trojan attack (or poisoning backdoor attack). The lack of robustness of
DNNs against Trojan attacks could significantly harm real-life machine learning
(ML) systems in downstream applications, therefore posing widespread concern to
their trustworthiness. In this paper, we study the problem of the Trojan
network (TrojanNet) detection in the data-scarce regime, where only the weights
of a trained DNN are accessed by the detector. We first propose a data-limited
TrojanNet detector (TND), when only a few data samples are available for
TrojanNet detection. We show that an effective data-limited TND can be
established by exploring connections between Trojan attack and
prediction-evasion adversarial attacks including per-sample attack as well as
all-sample universal attack. In addition, we propose a data-free TND, which can
detect a TrojanNet without accessing any data samples. We show that such a TND
can be built by leveraging the internal response of hidden neurons, which
exhibits the Trojan behavior even at random noise inputs. The effectiveness of
our proposals is evaluated by extensive experiments under different model
architectures and datasets including CIFAR-10, GTSRB, and ImageNet.
Related papers
- A Survey of Trojan Attacks and Defenses to Deep Neural Networks [3.9444202574850755]
Deep Neural Networks (DNNs) have found extensive applications in safety-critical artificial intelligence systems.
Recent research has revealed their susceptibility to Neural Network Trojans (NN Trojans) maliciously injected by adversaries.
arXiv Detail & Related papers (2024-08-15T04:20:32Z) - MDTD: A Multi Domain Trojan Detector for Deep Neural Networks [2.4651521935081364]
Machine learning models that use deep neural networks (DNNs) are vulnerable to backdoor attacks.
We propose MDTD, a Multi-Domain Trojan Detector for DNNs, which detects inputs containing a Trojan trigger at testing time.
We evaluate MDTD against adaptive attacks where an adversary trains a robust DNN to increase (decrease) distance of benign (Trojan) inputs from a decision boundary.
arXiv Detail & Related papers (2023-08-30T00:03:03Z) - FreeEagle: Detecting Complex Neural Trojans in Data-Free Cases [50.065022493142116]
Trojan attack on deep neural networks, also known as backdoor attack, is a typical threat to artificial intelligence.
FreeEagle is the first data-free backdoor detection method that can effectively detect complex backdoor attacks.
arXiv Detail & Related papers (2023-02-28T11:31:29Z) - Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free [126.15842954405929]
Trojan attacks threaten deep neural networks (DNNs) by poisoning them to behave normally on most samples, yet to produce manipulated results for inputs attached with a trigger.
We propose a novel Trojan network detection regime: first locating a "winning Trojan lottery ticket" which preserves nearly full Trojan information yet only chance-level performance on clean inputs; then recovering the trigger embedded in this already isolated subnetwork.
arXiv Detail & Related papers (2022-05-24T06:33:31Z) - CLEANN: Accelerated Trojan Shield for Embedded Neural Networks [32.99727805086791]
We propose CLEANN, the first end-to-end framework that enables online mitigation of Trojans for embedded Deep Neural Network (DNN) applications.
A Trojan attack works by injecting a backdoor in the DNN while training; during inference, the Trojan can be activated by the specific backdoor trigger.
We leverage dictionary learning and sparse approximation to characterize the statistical behavior of benign data and identify Trojan triggers.
arXiv Detail & Related papers (2020-09-04T05:29:38Z) - Cassandra: Detecting Trojaned Networks from Adversarial Perturbations [92.43879594465422]
In many cases, pre-trained models are sourced from vendors who may have disrupted the training pipeline to insert Trojan behaviors into the models.
We propose a method to verify if a pre-trained model is Trojaned or benign.
Our method captures fingerprints of neural networks in the form of adversarial perturbations learned from the network gradients.
arXiv Detail & Related papers (2020-07-28T19:00:40Z) - Odyssey: Creation, Analysis and Detection of Trojan Models [91.13959405645959]
Trojan attacks interfere with the training pipeline by inserting triggers into some of the training samples and trains the model to act maliciously only for samples that contain the trigger.
Existing Trojan detectors make strong assumptions about the types of triggers and attacks.
We propose a detector that is based on the analysis of the intrinsic properties; that are affected due to the Trojaning process.
arXiv Detail & Related papers (2020-07-16T06:55:00Z) - An Embarrassingly Simple Approach for Trojan Attack in Deep Neural
Networks [59.42357806777537]
trojan attack aims to attack deployed deep neural networks (DNNs) relying on hidden trigger patterns inserted by hackers.
We propose a training-free attack approach which is different from previous work, in which trojaned behaviors are injected by retraining model on a poisoned dataset.
The proposed TrojanNet has several nice properties including (1) it activates by tiny trigger patterns and keeps silent for other signals, (2) it is model-agnostic and could be injected into most DNNs, dramatically expanding its attack scenarios, and (3) the training-free mechanism saves massive training efforts compared to conventional trojan attack methods.
arXiv Detail & Related papers (2020-06-15T04:58:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.