Trojan Signatures in DNN Weights
- URL: http://arxiv.org/abs/2109.02836v1
- Date: Tue, 7 Sep 2021 03:07:03 GMT
- Title: Trojan Signatures in DNN Weights
- Authors: Greg Fields, Mohammad Samragh, Mojan Javaheripi, Farinaz Koushanfar,
Tara Javidi
- Abstract summary: We present the first ultra light-weight and highly effective trojan detection method that does not require access to the training/test data.
Our approach focuses on analysis of the weights of the final, linear layer of the network.
We show that the distribution of the weights associated with the trojan target class is clearly distinguishable from the weights associated with other classes.
- Score: 20.93172486021463
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks have been shown to be vulnerable to backdoor, or trojan,
attacks where an adversary has embedded a trigger in the network at training
time such that the model correctly classifies all standard inputs, but
generates a targeted, incorrect classification on any input which contains the
trigger. In this paper, we present the first ultra light-weight and highly
effective trojan detection method that does not require access to the
training/test data, does not involve any expensive computations, and makes no
assumptions on the nature of the trojan trigger. Our approach focuses on
analysis of the weights of the final, linear layer of the network. We
empirically demonstrate several characteristics of these weights that occur
frequently in trojaned networks, but not in benign networks. In particular, we
show that the distribution of the weights associated with the trojan target
class is clearly distinguishable from the weights associated with other
classes. Using this, we demonstrate the effectiveness of our proposed detection
method against state-of-the-art attacks across a variety of architectures,
datasets, and trigger types.
Related papers
- Trojan Cleansing with Neural Collapse [18.160116254921608]
Trojan attacks are sophisticated training-time attacks on neural networks that embed backdoor triggers.
We provide experimental evidence that trojan attacks disrupt this convergence for a variety of datasets and architectures.
We then use this disruption to design a lightweight, broadly generalizable mechanism for cleansing trojan attacks.
arXiv Detail & Related papers (2024-11-19T22:57:40Z) - Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free [126.15842954405929]
Trojan attacks threaten deep neural networks (DNNs) by poisoning them to behave normally on most samples, yet to produce manipulated results for inputs attached with a trigger.
We propose a novel Trojan network detection regime: first locating a "winning Trojan lottery ticket" which preserves nearly full Trojan information yet only chance-level performance on clean inputs; then recovering the trigger embedded in this already isolated subnetwork.
arXiv Detail & Related papers (2022-05-24T06:33:31Z) - Topological Detection of Trojaned Neural Networks [10.559903139528252]
Trojan attacks occur when attackers stealthily manipulate the model's behavior.
We find subtle structural deviation characterizing Trojaned models.
We devise a strategy for robust detection of Trojaned models.
arXiv Detail & Related papers (2021-06-11T15:48:16Z) - Practical Detection of Trojan Neural Networks: Data-Limited and
Data-Free Cases [87.69818690239627]
We study the problem of the Trojan network (TrojanNet) detection in the data-scarce regime.
We propose a data-limited TrojanNet detector (TND), when only a few data samples are available for TrojanNet detection.
In addition, we propose a data-free TND, which can detect a TrojanNet without accessing any data samples.
arXiv Detail & Related papers (2020-07-31T02:00:38Z) - Cassandra: Detecting Trojaned Networks from Adversarial Perturbations [92.43879594465422]
In many cases, pre-trained models are sourced from vendors who may have disrupted the training pipeline to insert Trojan behaviors into the models.
We propose a method to verify if a pre-trained model is Trojaned or benign.
Our method captures fingerprints of neural networks in the form of adversarial perturbations learned from the network gradients.
arXiv Detail & Related papers (2020-07-28T19:00:40Z) - Odyssey: Creation, Analysis and Detection of Trojan Models [91.13959405645959]
Trojan attacks interfere with the training pipeline by inserting triggers into some of the training samples and trains the model to act maliciously only for samples that contain the trigger.
Existing Trojan detectors make strong assumptions about the types of triggers and attacks.
We propose a detector that is based on the analysis of the intrinsic properties; that are affected due to the Trojaning process.
arXiv Detail & Related papers (2020-07-16T06:55:00Z) - ConFoc: Content-Focus Protection Against Trojan Attacks on Neural
Networks [0.0]
trojan attacks insert some misbehavior at training using samples with a mark or trigger, which is exploited at inference or testing time.
We propose a novel defensive technique against trojan attacks, in which DNNs are taught to disregard the styles of inputs and focus on their content.
Results show that the method reduces the attack success rate significantly to values 1% in all the tested attacks.
arXiv Detail & Related papers (2020-07-01T19:25:34Z) - An Embarrassingly Simple Approach for Trojan Attack in Deep Neural
Networks [59.42357806777537]
trojan attack aims to attack deployed deep neural networks (DNNs) relying on hidden trigger patterns inserted by hackers.
We propose a training-free attack approach which is different from previous work, in which trojaned behaviors are injected by retraining model on a poisoned dataset.
The proposed TrojanNet has several nice properties including (1) it activates by tiny trigger patterns and keeps silent for other signals, (2) it is model-agnostic and could be injected into most DNNs, dramatically expanding its attack scenarios, and (3) the training-free mechanism saves massive training efforts compared to conventional trojan attack methods.
arXiv Detail & Related papers (2020-06-15T04:58:28Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.