Related papers: An Adaptive Black-box Defense against Trojan Attacks (TrojDef)

An Adaptive Black-box Defense against Trojan Attacks (TrojDef)

URL: http://arxiv.org/abs/2209.01721v1
Date: Mon, 5 Sep 2022 01:54:44 GMT
Title: An Adaptive Black-box Defense against Trojan Attacks (TrojDef)
Authors: Guanxiong Liu, Abdallah Khreishah, Fatima Sharadgah, Issa Khalil
Abstract summary: Trojan backdoor is a poisoning attack against Neural Network (NN) classifiers. We propose a more practical black-box defense, dubbed TrojDef, which can only run forward-pass of the NN. TrojDef significantly outperforms the-state-of-the-art defenses and is highly stable under different settings.
Score: 5.880596125802611
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Trojan backdoor is a poisoning attack against Neural Network (NN) classifiers in which adversaries try to exploit the (highly desirable) model reuse property to implant Trojans into model parameters for backdoor breaches through a poisoned training process. Most of the proposed defenses against Trojan attacks assume a white-box setup, in which the defender either has access to the inner state of NN or is able to run back-propagation through it. In this work, we propose a more practical black-box defense, dubbed TrojDef, which can only run forward-pass of the NN. TrojDef tries to identify and filter out Trojan inputs (i.e., inputs augmented with the Trojan trigger) by monitoring the changes in the prediction confidence when the input is repeatedly perturbed by random noise. We derive a function based on the prediction outputs which is called the prediction confidence bound to decide whether the input example is Trojan or not. The intuition is that Trojan inputs are more stable as the misclassification only depends on the trigger, while benign inputs will suffer when augmented with noise due to the perturbation of the classification features. Through mathematical analysis, we show that if the attacker is perfect in injecting the backdoor, the Trojan infected model will be trained to learn the appropriate prediction confidence bound, which is used to distinguish Trojan and benign inputs under arbitrary perturbations. However, because the attacker might not be perfect in injecting the backdoor, we introduce a nonlinear transform to the prediction confidence bound to improve the detection accuracy in practical settings. Extensive empirical evaluations show that TrojDef significantly outperforms the-state-of-the-art defenses and is highly stable under different settings, even when the classifier architecture, the training process, or the hyper-parameters change.

Related papers

Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free [126.15842954405929]
Trojan attacks threaten deep neural networks (DNNs) by poisoning them to behave normally on most samples, yet to produce manipulated results for inputs attached with a trigger. We propose a novel Trojan network detection regime: first locating a "winning Trojan lottery ticket" which preserves nearly full Trojan information yet only chance-level performance on clean inputs; then recovering the trigger embedded in this already isolated subnetwork.
arXiv Detail & Related papers (2022-05-24T06:33:31Z)
Trojan Horse Training for Breaking Defenses against Backdoor Attacks in Deep Learning [7.3007220721129364]
ML models that contain a backdoor are called Trojan models. Current single-target backdoor attacks require one trigger per target class. We introduce a new, more general attack that will enable a single trigger to result in misclassification to more than one target class.
arXiv Detail & Related papers (2022-03-25T02:54:27Z)
Towards Effective and Robust Neural Trojan Defenses via Input Filtering [67.01177442955522]
Trojan attacks on deep neural networks are both dangerous and surreptitious. Over the past few years, Trojan attacks have advanced from using only a simple trigger and targeting only one class to using many sophisticated triggers and targeting multiple classes. Most defense methods still make out-of-date assumptions about Trojan triggers and target classes, thus, can be easily circumvented by modern Trojan attacks.
arXiv Detail & Related papers (2022-02-24T15:41:37Z)
CatchBackdoor: Backdoor Detection via Critical Trojan Neural Path Fuzzing [16.44147178061005]
trojaned behaviors triggered by various trojan attacks can be attributed to the trojan path. We propose CatchBackdoor, a detection method against trojan attacks.
arXiv Detail & Related papers (2021-12-24T13:57:03Z)
A Synergetic Attack against Neural Network Classifiers combining Backdoor and Adversarial Examples [11.534521802321976]
We show how to jointly exploit adversarial perturbation and model poisoning vulnerabilities to practically launch a new stealthy attack, dubbed AdvTrojan. AdvTrojan is stealthy because it can be activated only when: 1) a carefully crafted adversarial perturbation is injected into the input examples during inference, and 2) a Trojan backdoor is implanted during the training process of the model.
arXiv Detail & Related papers (2021-09-03T02:18:57Z)
CLEANN: Accelerated Trojan Shield for Embedded Neural Networks [32.99727805086791]
We propose CLEANN, the first end-to-end framework that enables online mitigation of Trojans for embedded Deep Neural Network (DNN) applications. A Trojan attack works by injecting a backdoor in the DNN while training; during inference, the Trojan can be activated by the specific backdoor trigger. We leverage dictionary learning and sparse approximation to characterize the statistical behavior of benign data and identify Trojan triggers.
arXiv Detail & Related papers (2020-09-04T05:29:38Z)
Practical Detection of Trojan Neural Networks: Data-Limited and Data-Free Cases [87.69818690239627]
We study the problem of the Trojan network (TrojanNet) detection in the data-scarce regime. We propose a data-limited TrojanNet detector (TND), when only a few data samples are available for TrojanNet detection. In addition, we propose a data-free TND, which can detect a TrojanNet without accessing any data samples.
arXiv Detail & Related papers (2020-07-31T02:00:38Z)
Odyssey: Creation, Analysis and Detection of Trojan Models [91.13959405645959]
Trojan attacks interfere with the training pipeline by inserting triggers into some of the training samples and trains the model to act maliciously only for samples that contain the trigger. Existing Trojan detectors make strong assumptions about the types of triggers and attacks. We propose a detector that is based on the analysis of the intrinsic properties; that are affected due to the Trojaning process.
arXiv Detail & Related papers (2020-07-16T06:55:00Z)
An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks [59.42357806777537]
trojan attack aims to attack deployed deep neural networks (DNNs) relying on hidden trigger patterns inserted by hackers. We propose a training-free attack approach which is different from previous work, in which trojaned behaviors are injected by retraining model on a poisoned dataset. The proposed TrojanNet has several nice properties including (1) it activates by tiny trigger patterns and keeps silent for other signals, (2) it is model-agnostic and could be injected into most DNNs, dramatically expanding its attack scenarios, and (3) the training-free mechanism saves massive training efforts compared to conventional trojan attack methods.
arXiv Detail & Related papers (2020-06-15T04:58:28Z)
Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch. We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types. In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.