CatchBackdoor: Backdoor Detection via Critical Trojan Neural Path Fuzzing
- URL: http://arxiv.org/abs/2112.13064v3
- Date: Wed, 17 Jul 2024 13:58:13 GMT
- Title: CatchBackdoor: Backdoor Detection via Critical Trojan Neural Path Fuzzing
- Authors: Haibo Jin, Ruoxi Chen, Jinyin Chen, Haibin Zheng, Yang Zhang, Haohan Wang,
- Abstract summary: trojaned behaviors triggered by various trojan attacks can be attributed to the trojan path.
We propose CatchBackdoor, a detection method against trojan attacks.
- Score: 16.44147178061005
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The success of deep neural networks (DNNs) in real-world applications has benefited from abundant pre-trained models. However, the backdoored pre-trained models can pose a significant trojan threat to the deployment of downstream DNNs. Numerous backdoor detection methods have been proposed but are limited to two aspects: (1) high sensitivity on trigger size, especially on stealthy attacks (i.e., blending attacks and defense adaptive attacks); (2) rely heavily on benign examples for reverse engineering. To address these challenges, we empirically observed that trojaned behaviors triggered by various trojan attacks can be attributed to the trojan path, composed of top-$k$ critical neurons with more significant contributions to model prediction changes. Motivated by it, we propose CatchBackdoor, a detection method against trojan attacks. Based on the close connection between trojaned behaviors and trojan path to trigger errors, CatchBackdoor starts from the benign path and gradually approximates the trojan path through differential fuzzing. We then reverse triggers from the trojan path, to trigger errors caused by diverse trojaned attacks. Extensive experiments on MINST, CIFAR-10, and a-ImageNet datasets and 7 models (LeNet, ResNet, and VGG) demonstrate the superiority of CatchBackdoor over the state-of-the-art methods, in terms of (1) \emph{effective} - it shows better detection performance, especially on stealthy attacks ($\sim$ $\times$ 2 on average); (2) \emph{extensible} - it is robust to trigger size and can conduct detection without benign examples.
Related papers
- Attention-Enhancing Backdoor Attacks Against BERT-based Models [54.070555070629105]
Investigating the strategies of backdoor attacks will help to understand the model's vulnerability.
We propose a novel Trojan Attention Loss (TAL) which enhances the Trojan behavior by directly manipulating the attention patterns.
arXiv Detail & Related papers (2023-10-23T01:24:56Z) - FreeEagle: Detecting Complex Neural Trojans in Data-Free Cases [50.065022493142116]
Trojan attack on deep neural networks, also known as backdoor attack, is a typical threat to artificial intelligence.
FreeEagle is the first data-free backdoor detection method that can effectively detect complex backdoor attacks.
arXiv Detail & Related papers (2023-02-28T11:31:29Z) - Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free [126.15842954405929]
Trojan attacks threaten deep neural networks (DNNs) by poisoning them to behave normally on most samples, yet to produce manipulated results for inputs attached with a trigger.
We propose a novel Trojan network detection regime: first locating a "winning Trojan lottery ticket" which preserves nearly full Trojan information yet only chance-level performance on clean inputs; then recovering the trigger embedded in this already isolated subnetwork.
arXiv Detail & Related papers (2022-05-24T06:33:31Z) - A Synergetic Attack against Neural Network Classifiers combining
Backdoor and Adversarial Examples [11.534521802321976]
We show how to jointly exploit adversarial perturbation and model poisoning vulnerabilities to practically launch a new stealthy attack, dubbed AdvTrojan.
AdvTrojan is stealthy because it can be activated only when: 1) a carefully crafted adversarial perturbation is injected into the input examples during inference, and 2) a Trojan backdoor is implanted during the training process of the model.
arXiv Detail & Related papers (2021-09-03T02:18:57Z) - Topological Detection of Trojaned Neural Networks [10.559903139528252]
Trojan attacks occur when attackers stealthily manipulate the model's behavior.
We find subtle structural deviation characterizing Trojaned models.
We devise a strategy for robust detection of Trojaned models.
arXiv Detail & Related papers (2021-06-11T15:48:16Z) - Deep Feature Space Trojan Attack of Neural Networks by Controlled
Detoxification [21.631699720855995]
Trojan (backdoor) attack is a form of adversarial attack on deep neural networks.
We propose a novel deep feature space trojan attack with five characteristics.
arXiv Detail & Related papers (2020-12-21T09:46:12Z) - CLEANN: Accelerated Trojan Shield for Embedded Neural Networks [32.99727805086791]
We propose CLEANN, the first end-to-end framework that enables online mitigation of Trojans for embedded Deep Neural Network (DNN) applications.
A Trojan attack works by injecting a backdoor in the DNN while training; during inference, the Trojan can be activated by the specific backdoor trigger.
We leverage dictionary learning and sparse approximation to characterize the statistical behavior of benign data and identify Trojan triggers.
arXiv Detail & Related papers (2020-09-04T05:29:38Z) - Practical Detection of Trojan Neural Networks: Data-Limited and
Data-Free Cases [87.69818690239627]
We study the problem of the Trojan network (TrojanNet) detection in the data-scarce regime.
We propose a data-limited TrojanNet detector (TND), when only a few data samples are available for TrojanNet detection.
In addition, we propose a data-free TND, which can detect a TrojanNet without accessing any data samples.
arXiv Detail & Related papers (2020-07-31T02:00:38Z) - Odyssey: Creation, Analysis and Detection of Trojan Models [91.13959405645959]
Trojan attacks interfere with the training pipeline by inserting triggers into some of the training samples and trains the model to act maliciously only for samples that contain the trigger.
Existing Trojan detectors make strong assumptions about the types of triggers and attacks.
We propose a detector that is based on the analysis of the intrinsic properties; that are affected due to the Trojaning process.
arXiv Detail & Related papers (2020-07-16T06:55:00Z) - An Embarrassingly Simple Approach for Trojan Attack in Deep Neural
Networks [59.42357806777537]
trojan attack aims to attack deployed deep neural networks (DNNs) relying on hidden trigger patterns inserted by hackers.
We propose a training-free attack approach which is different from previous work, in which trojaned behaviors are injected by retraining model on a poisoned dataset.
The proposed TrojanNet has several nice properties including (1) it activates by tiny trigger patterns and keeps silent for other signals, (2) it is model-agnostic and could be injected into most DNNs, dramatically expanding its attack scenarios, and (3) the training-free mechanism saves massive training efforts compared to conventional trojan attack methods.
arXiv Detail & Related papers (2020-06-15T04:58:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.