An Adaptive Black-box Backdoor Detection Method for Deep Neural Networks
- URL: http://arxiv.org/abs/2204.04329v1
- Date: Fri, 8 Apr 2022 23:41:19 GMT
- Title: An Adaptive Black-box Backdoor Detection Method for Deep Neural Networks
- Authors: Xinqiao Zhang, Huili Chen, Ke Huang, Farinaz Koushanfar
- Abstract summary: Deep Neural Networks (DNNs) have demonstrated unprecedented performance across various fields such as medical diagnosis and autonomous driving.
They are identified to be vulnerable to Neural Trojan (NT) attacks that are controlled and activated by stealthy triggers.
We propose a robust and adaptive Trojan detection scheme that inspects whether a pre-trained model has been Trojaned before its deployment.
- Score: 25.593824693347113
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the surge of Machine Learning (ML), An emerging amount of intelligent
applications have been developed. Deep Neural Networks (DNNs) have demonstrated
unprecedented performance across various fields such as medical diagnosis and
autonomous driving. While DNNs are widely employed in security-sensitive
fields, they are identified to be vulnerable to Neural Trojan (NT) attacks that
are controlled and activated by stealthy triggers. In this paper, we target to
design a robust and adaptive Trojan detection scheme that inspects whether a
pre-trained model has been Trojaned before its deployment. Prior works are
oblivious of the intrinsic property of trigger distribution and try to
reconstruct the trigger pattern using simple heuristics, i.e., stimulating the
given model to incorrect outputs. As a result, their detection time and
effectiveness are limited. We leverage the observation that the pixel trigger
typically features spatial dependency and propose the first trigger
approximation based black-box Trojan detection framework that enables a fast
and scalable search of the trigger in the input space. Furthermore, our
approach can also detect Trojans embedded in the feature space where certain
filter transformations are used to activate the Trojan. We perform extensive
experiments to investigate the performance of our approach across various
datasets and ML models. Empirical results show that our approach achieves a
ROC-AUC score of 0.93 on the public TrojAI dataset. Our code can be found at
https://github.com/xinqiaozhang/adatrojan
Related papers
- Evolutionary Trigger Detection and Lightweight Model Repair Based Backdoor Defense [10.310546695762467]
Deep Neural Networks (DNNs) have been widely used in many areas such as autonomous driving and face recognition.
A backdoor in the DNN model can be activated by a poisoned input with trigger and leads to wrong prediction.
We propose an efficient backdoor defense based on evolutionary trigger detection and lightweight model repair.
arXiv Detail & Related papers (2024-07-07T14:50:59Z) - Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable [70.77600345240867]
A novel arbitrary-in-arbitrary-out (AIAO) strategy makes watermarks resilient to fine-tuning-based removal.
Unlike the existing methods of designing a backdoor for the input/output space of diffusion models, in our method, we propose to embed the backdoor into the feature space of sampled subpaths.
Our empirical studies on the MS-COCO, AFHQ, LSUN, CUB-200, and DreamBooth datasets confirm the robustness of AIAO.
arXiv Detail & Related papers (2024-05-01T12:03:39Z) - FreeEagle: Detecting Complex Neural Trojans in Data-Free Cases [50.065022493142116]
Trojan attack on deep neural networks, also known as backdoor attack, is a typical threat to artificial intelligence.
FreeEagle is the first data-free backdoor detection method that can effectively detect complex backdoor attacks.
arXiv Detail & Related papers (2023-02-28T11:31:29Z) - Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free [126.15842954405929]
Trojan attacks threaten deep neural networks (DNNs) by poisoning them to behave normally on most samples, yet to produce manipulated results for inputs attached with a trigger.
We propose a novel Trojan network detection regime: first locating a "winning Trojan lottery ticket" which preserves nearly full Trojan information yet only chance-level performance on clean inputs; then recovering the trigger embedded in this already isolated subnetwork.
arXiv Detail & Related papers (2022-05-24T06:33:31Z) - Trigger Hunting with a Topological Prior for Trojan Detection [16.376009231934884]
This paper tackles the problem of Trojan detection, namely, identifying Trojaned models.
One popular approach is reverse engineering, recovering the triggers on a clean image by manipulating the model's prediction.
One major challenge of reverse engineering approach is the enormous search space of triggers.
We propose innovative priors such as diversity and topological simplicity to not only increase the chances of finding the appropriate triggers but also improve the quality of the found triggers.
arXiv Detail & Related papers (2021-10-15T19:47:00Z) - TAD: Trigger Approximation based Black-box Trojan Detection for AI [16.741385045881113]
Deep Neural Networks (DNNs) have demonstrated unprecedented performance across various fields such as medical diagnosis and autonomous driving.
They are identified to be vulnerable to Trojan (NT) attacks that are controlled and activated by the trigger.
We propose a robust Trojan detection scheme that inspects whether a pre-trained AI model has been Trojaned before its deployment.
arXiv Detail & Related papers (2021-02-03T00:49:50Z) - Detecting Trojaned DNNs Using Counterfactual Attributions [15.988574580713328]
Such models behave normally with typical inputs but produce specific incorrect predictions for inputs with a Trojan trigger.
Our approach is based on a novel observation that the trigger behavior depends on a few ghost neurons that activate on trigger pattern.
We use this information for Trojan detection by using a deep set encoder.
arXiv Detail & Related papers (2020-12-03T21:21:33Z) - Cassandra: Detecting Trojaned Networks from Adversarial Perturbations [92.43879594465422]
In many cases, pre-trained models are sourced from vendors who may have disrupted the training pipeline to insert Trojan behaviors into the models.
We propose a method to verify if a pre-trained model is Trojaned or benign.
Our method captures fingerprints of neural networks in the form of adversarial perturbations learned from the network gradients.
arXiv Detail & Related papers (2020-07-28T19:00:40Z) - Odyssey: Creation, Analysis and Detection of Trojan Models [91.13959405645959]
Trojan attacks interfere with the training pipeline by inserting triggers into some of the training samples and trains the model to act maliciously only for samples that contain the trigger.
Existing Trojan detectors make strong assumptions about the types of triggers and attacks.
We propose a detector that is based on the analysis of the intrinsic properties; that are affected due to the Trojaning process.
arXiv Detail & Related papers (2020-07-16T06:55:00Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.