Towards Effective and Robust Neural Trojan Defenses via Input Filtering
- URL: http://arxiv.org/abs/2202.12154v1
- Date: Thu, 24 Feb 2022 15:41:37 GMT
- Title: Towards Effective and Robust Neural Trojan Defenses via Input Filtering
- Authors: Kien Do, Haripriya Harikumar, Hung Le, Dung Nguyen, Truyen Tran, Santu
Rana, Dang Nguyen, Willy Susilo, Svetha Venkatesh
- Abstract summary: Trojan attacks on deep neural networks are both dangerous and surreptitious.
Over the past few years, Trojan attacks have advanced from using only a simple trigger and targeting only one class to using many sophisticated triggers and targeting multiple classes.
Most defense methods still make out-of-date assumptions about Trojan triggers and target classes, thus, can be easily circumvented by modern Trojan attacks.
- Score: 67.01177442955522
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Trojan attacks on deep neural networks are both dangerous and surreptitious.
Over the past few years, Trojan attacks have advanced from using only a simple
trigger and targeting only one class to using many sophisticated triggers and
targeting multiple classes. However, Trojan defenses have not caught up with
this development. Most defense methods still make out-of-date assumptions about
Trojan triggers and target classes, thus, can be easily circumvented by modern
Trojan attacks. In this paper, we advocate general defenses that are effective
and robust against various Trojan attacks and propose two novel "filtering"
defenses with these characteristics called Variational Input Filtering (VIF)
and Adversarial Input Filtering (AIF). VIF and AIF leverage variational
inference and adversarial training respectively to purify all potential Trojan
triggers in the input at run time without making any assumption about their
numbers and forms. We further extend "filtering" to
"filtering-then-contrasting" - a new defense mechanism that helps avoid the
drop in classification accuracy on clean data caused by filtering. Extensive
experimental results show that our proposed defenses significantly outperform 4
well-known defenses in mitigating 5 different Trojan attacks including the two
state-of-the-art which defeat many strong defenses.
Related papers
- TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets [74.12197473591128]
We propose an effective Trojan attack against diffusion models, TrojDiff.
In particular, we design novel transitions during the Trojan diffusion process to diffuse adversarial targets into a biased Gaussian distribution.
We show that TrojDiff always achieves high attack performance under different adversarial targets using different types of triggers.
arXiv Detail & Related papers (2023-03-10T08:01:23Z) - An Adaptive Black-box Defense against Trojan Attacks (TrojDef) [5.880596125802611]
Trojan backdoor is a poisoning attack against Neural Network (NN) classifiers.
We propose a more practical black-box defense, dubbed TrojDef, which can only run forward-pass of the NN.
TrojDef significantly outperforms the-state-of-the-art defenses and is highly stable under different settings.
arXiv Detail & Related papers (2022-09-05T01:54:44Z) - Defense Against Multi-target Trojan Attacks [31.54111353219381]
Trojan attacks are the hardest to defend against.
Badnet kind of attacks introduces Trojan backdoors to multiple target classes and allows triggers to be placed anywhere in the image.
To defend against this attack, we first introduce a trigger reverse-engineering mechanism that uses multiple images to recover a variety of potential triggers.
We then propose a detection mechanism by measuring the transferability of such recovered triggers.
arXiv Detail & Related papers (2022-07-08T13:29:13Z) - Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free [126.15842954405929]
Trojan attacks threaten deep neural networks (DNNs) by poisoning them to behave normally on most samples, yet to produce manipulated results for inputs attached with a trigger.
We propose a novel Trojan network detection regime: first locating a "winning Trojan lottery ticket" which preserves nearly full Trojan information yet only chance-level performance on clean inputs; then recovering the trigger embedded in this already isolated subnetwork.
arXiv Detail & Related papers (2022-05-24T06:33:31Z) - Trojan Horse Training for Breaking Defenses against Backdoor Attacks in
Deep Learning [7.3007220721129364]
ML models that contain a backdoor are called Trojan models.
Current single-target backdoor attacks require one trigger per target class.
We introduce a new, more general attack that will enable a single trigger to result in misclassification to more than one target class.
arXiv Detail & Related papers (2022-03-25T02:54:27Z) - Semantic Host-free Trojan Attack [54.25471812198403]
We propose a novel host-free Trojan attack with triggers that are fixed in the semantic space but not necessarily in the pixel space.
In contrast to existing Trojan attacks which use clean input images as hosts to carry small, meaningless trigger patterns, our attack considers triggers as full-sized images belonging to a semantically meaningful object class.
arXiv Detail & Related papers (2021-10-26T05:01:22Z) - Odyssey: Creation, Analysis and Detection of Trojan Models [91.13959405645959]
Trojan attacks interfere with the training pipeline by inserting triggers into some of the training samples and trains the model to act maliciously only for samples that contain the trigger.
Existing Trojan detectors make strong assumptions about the types of triggers and attacks.
We propose a detector that is based on the analysis of the intrinsic properties; that are affected due to the Trojaning process.
arXiv Detail & Related papers (2020-07-16T06:55:00Z) - An Embarrassingly Simple Approach for Trojan Attack in Deep Neural
Networks [59.42357806777537]
trojan attack aims to attack deployed deep neural networks (DNNs) relying on hidden trigger patterns inserted by hackers.
We propose a training-free attack approach which is different from previous work, in which trojaned behaviors are injected by retraining model on a poisoned dataset.
The proposed TrojanNet has several nice properties including (1) it activates by tiny trigger patterns and keeps silent for other signals, (2) it is model-agnostic and could be injected into most DNNs, dramatically expanding its attack scenarios, and (3) the training-free mechanism saves massive training efforts compared to conventional trojan attack methods.
arXiv Detail & Related papers (2020-06-15T04:58:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.