Trigger Hunting with a Topological Prior for Trojan Detection
- URL: http://arxiv.org/abs/2110.08335v1
- Date: Fri, 15 Oct 2021 19:47:00 GMT
- Title: Trigger Hunting with a Topological Prior for Trojan Detection
- Authors: Xiaoling Hu, Xiao Lin, Michael Cogswell, Yi Yao, Susmit Jha, Chao Chen
- Abstract summary: This paper tackles the problem of Trojan detection, namely, identifying Trojaned models.
One popular approach is reverse engineering, recovering the triggers on a clean image by manipulating the model's prediction.
One major challenge of reverse engineering approach is the enormous search space of triggers.
We propose innovative priors such as diversity and topological simplicity to not only increase the chances of finding the appropriate triggers but also improve the quality of the found triggers.
- Score: 16.376009231934884
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Despite their success and popularity, deep neural networks (DNNs) are
vulnerable when facing backdoor attacks. This impedes their wider adoption,
especially in mission critical applications. This paper tackles the problem of
Trojan detection, namely, identifying Trojaned models -- models trained with
poisoned data. One popular approach is reverse engineering, i.e., recovering
the triggers on a clean image by manipulating the model's prediction. One major
challenge of reverse engineering approach is the enormous search space of
triggers. To this end, we propose innovative priors such as diversity and
topological simplicity to not only increase the chances of finding the
appropriate triggers but also improve the quality of the found triggers.
Moreover, by encouraging a diverse set of trigger candidates, our method can
perform effectively in cases with unknown target labels. We demonstrate that
these priors can significantly improve the quality of the recovered triggers,
resulting in substantially improved Trojan detection accuracy as validated on
both synthetic and publicly available TrojAI benchmarks.
Related papers
- Solving Trojan Detection Competitions with Linear Weight Classification [1.24275433420322]
We introduce a detector that works remarkably well across many of the existing datasets and domains.
We evaluate this algorithm on a diverse set of Trojan detection benchmarks and domains.
arXiv Detail & Related papers (2024-11-05T19:00:34Z) - SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - DisDet: Exploring Detectability of Backdoor Attack on Diffusion Models [23.502100653704446]
Some pioneering works have shown the vulnerability of the diffusion model against backdoor attacks.
In this paper, for the first time, we explore the detectability of the poisoned noise input for the backdoored diffusion models.
We propose a low-cost trigger detection mechanism that can effectively identify the poisoned input noise.
We then take a further step to study the same problem from the attack side, proposing a backdoor attack strategy that can learn the unnoticeable trigger.
arXiv Detail & Related papers (2024-02-05T05:46:31Z) - Attention-Enhancing Backdoor Attacks Against BERT-based Models [54.070555070629105]
Investigating the strategies of backdoor attacks will help to understand the model's vulnerability.
We propose a novel Trojan Attention Loss (TAL) which enhances the Trojan behavior by directly manipulating the attention patterns.
arXiv Detail & Related papers (2023-10-23T01:24:56Z) - FreeEagle: Detecting Complex Neural Trojans in Data-Free Cases [50.065022493142116]
Trojan attack on deep neural networks, also known as backdoor attack, is a typical threat to artificial intelligence.
FreeEagle is the first data-free backdoor detection method that can effectively detect complex backdoor attacks.
arXiv Detail & Related papers (2023-02-28T11:31:29Z) - An Adaptive Black-box Backdoor Detection Method for Deep Neural Networks [25.593824693347113]
Deep Neural Networks (DNNs) have demonstrated unprecedented performance across various fields such as medical diagnosis and autonomous driving.
They are identified to be vulnerable to Neural Trojan (NT) attacks that are controlled and activated by stealthy triggers.
We propose a robust and adaptive Trojan detection scheme that inspects whether a pre-trained model has been Trojaned before its deployment.
arXiv Detail & Related papers (2022-04-08T23:41:19Z) - TAD: Trigger Approximation based Black-box Trojan Detection for AI [16.741385045881113]
Deep Neural Networks (DNNs) have demonstrated unprecedented performance across various fields such as medical diagnosis and autonomous driving.
They are identified to be vulnerable to Trojan (NT) attacks that are controlled and activated by the trigger.
We propose a robust Trojan detection scheme that inspects whether a pre-trained AI model has been Trojaned before its deployment.
arXiv Detail & Related papers (2021-02-03T00:49:50Z) - Detecting Trojaned DNNs Using Counterfactual Attributions [15.988574580713328]
Such models behave normally with typical inputs but produce specific incorrect predictions for inputs with a Trojan trigger.
Our approach is based on a novel observation that the trigger behavior depends on a few ghost neurons that activate on trigger pattern.
We use this information for Trojan detection by using a deep set encoder.
arXiv Detail & Related papers (2020-12-03T21:21:33Z) - Cassandra: Detecting Trojaned Networks from Adversarial Perturbations [92.43879594465422]
In many cases, pre-trained models are sourced from vendors who may have disrupted the training pipeline to insert Trojan behaviors into the models.
We propose a method to verify if a pre-trained model is Trojaned or benign.
Our method captures fingerprints of neural networks in the form of adversarial perturbations learned from the network gradients.
arXiv Detail & Related papers (2020-07-28T19:00:40Z) - Odyssey: Creation, Analysis and Detection of Trojan Models [91.13959405645959]
Trojan attacks interfere with the training pipeline by inserting triggers into some of the training samples and trains the model to act maliciously only for samples that contain the trigger.
Existing Trojan detectors make strong assumptions about the types of triggers and attacks.
We propose a detector that is based on the analysis of the intrinsic properties; that are affected due to the Trojaning process.
arXiv Detail & Related papers (2020-07-16T06:55:00Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.