Trojan Horse Training for Breaking Defenses against Backdoor Attacks in
Deep Learning
- URL: http://arxiv.org/abs/2203.15506v1
- Date: Fri, 25 Mar 2022 02:54:27 GMT
- Title: Trojan Horse Training for Breaking Defenses against Backdoor Attacks in
Deep Learning
- Authors: Arezoo Rajabi, Bhaskar Ramasubramanian, Radha Poovendran
- Abstract summary: ML models that contain a backdoor are called Trojan models.
Current single-target backdoor attacks require one trigger per target class.
We introduce a new, more general attack that will enable a single trigger to result in misclassification to more than one target class.
- Score: 7.3007220721129364
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning (ML) models that use deep neural networks are vulnerable to
backdoor attacks. Such attacks involve the insertion of a (hidden) trigger by
an adversary. As a consequence, any input that contains the trigger will cause
the neural network to misclassify the input to a (single) target class, while
classifying other inputs without a trigger correctly. ML models that contain a
backdoor are called Trojan models. Backdoors can have severe consequences in
safety-critical cyber and cyber physical systems when only the outputs of the
model are available. Defense mechanisms have been developed and illustrated to
be able to distinguish between outputs from a Trojan model and a non-Trojan
model in the case of a single-target backdoor attack with accuracy > 96
percent. Understanding the limitations of a defense mechanism requires the
construction of examples where the mechanism fails. Current single-target
backdoor attacks require one trigger per target class. We introduce a new, more
general attack that will enable a single trigger to result in misclassification
to more than one target class. Such a misclassification will depend on the true
(actual) class that the input belongs to. We term this category of attacks
multi-target backdoor attacks. We demonstrate that a Trojan model with either a
single-target or multi-target trigger can be trained so that the accuracy of a
defense mechanism that seeks to distinguish between outputs coming from a
Trojan and a non-Trojan model will be reduced. Our approach uses the non-Trojan
model as a teacher for the Trojan model and solves a min-max optimization
problem between the Trojan model and defense mechanism. Empirical evaluations
demonstrate that our training procedure reduces the accuracy of a
state-of-the-art defense mechanism from >96 to 0 percent.
Related papers
- TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models [69.37990698561299]
TrojFM is a novel backdoor attack tailored for very large foundation models.
Our approach injects backdoors by fine-tuning only a very small proportion of model parameters.
We demonstrate that TrojFM can launch effective backdoor attacks against widely used large GPT-style models.
arXiv Detail & Related papers (2024-05-27T03:10:57Z) - Attention-Enhancing Backdoor Attacks Against BERT-based Models [54.070555070629105]
Investigating the strategies of backdoor attacks will help to understand the model's vulnerability.
We propose a novel Trojan Attention Loss (TAL) which enhances the Trojan behavior by directly manipulating the attention patterns.
arXiv Detail & Related papers (2023-10-23T01:24:56Z) - Evil from Within: Machine Learning Backdoors through Hardware Trojans [72.99519529521919]
Backdoors pose a serious threat to machine learning, as they can compromise the integrity of security-critical systems, such as self-driving cars.
We introduce a backdoor attack that completely resides within a common hardware accelerator for machine learning.
We demonstrate the practical feasibility of our attack by implanting our hardware trojan into the Xilinx Vitis AI DPU.
arXiv Detail & Related papers (2023-04-17T16:24:48Z) - FreeEagle: Detecting Complex Neural Trojans in Data-Free Cases [50.065022493142116]
Trojan attack on deep neural networks, also known as backdoor attack, is a typical threat to artificial intelligence.
FreeEagle is the first data-free backdoor detection method that can effectively detect complex backdoor attacks.
arXiv Detail & Related papers (2023-02-28T11:31:29Z) - BEAGLE: Forensics of Deep Learning Backdoor Attack for Better Defense [26.314275611787984]
Attack forensics is a critical counter-measure for traditional cyber attacks.
Deep Learning backdoor attacks have a threat model similar to traditional cyber attacks.
We propose a novel model backdoor forensics technique.
arXiv Detail & Related papers (2023-01-16T02:59:40Z) - Marksman Backdoor: Backdoor Attacks with Arbitrary Target Class [17.391987602738606]
In recent years, machine learning models have been shown to be vulnerable to backdoor attacks.
This paper exploits a novel backdoor attack with a much more powerful payload, denoted as Marksman.
We show empirically that the proposed framework achieves high attack performance while preserving the clean-data performance in several benchmark datasets.
arXiv Detail & Related papers (2022-10-17T15:46:57Z) - An Adaptive Black-box Defense against Trojan Attacks (TrojDef) [5.880596125802611]
Trojan backdoor is a poisoning attack against Neural Network (NN) classifiers.
We propose a more practical black-box defense, dubbed TrojDef, which can only run forward-pass of the NN.
TrojDef significantly outperforms the-state-of-the-art defenses and is highly stable under different settings.
arXiv Detail & Related papers (2022-09-05T01:54:44Z) - MM-BD: Post-Training Detection of Backdoor Attacks with Arbitrary
Backdoor Pattern Types Using a Maximum Margin Statistic [27.62279831135902]
We propose a post-training defense that detects backdoor attacks with arbitrary types of backdoor embeddings.
Our detector does not need any legitimate clean samples, and can efficiently detect backdoor attacks with arbitrary numbers of source classes.
arXiv Detail & Related papers (2022-05-13T21:32:24Z) - Odyssey: Creation, Analysis and Detection of Trojan Models [91.13959405645959]
Trojan attacks interfere with the training pipeline by inserting triggers into some of the training samples and trains the model to act maliciously only for samples that contain the trigger.
Existing Trojan detectors make strong assumptions about the types of triggers and attacks.
We propose a detector that is based on the analysis of the intrinsic properties; that are affected due to the Trojaning process.
arXiv Detail & Related papers (2020-07-16T06:55:00Z) - An Embarrassingly Simple Approach for Trojan Attack in Deep Neural
Networks [59.42357806777537]
trojan attack aims to attack deployed deep neural networks (DNNs) relying on hidden trigger patterns inserted by hackers.
We propose a training-free attack approach which is different from previous work, in which trojaned behaviors are injected by retraining model on a poisoned dataset.
The proposed TrojanNet has several nice properties including (1) it activates by tiny trigger patterns and keeps silent for other signals, (2) it is model-agnostic and could be injected into most DNNs, dramatically expanding its attack scenarios, and (3) the training-free mechanism saves massive training efforts compared to conventional trojan attack methods.
arXiv Detail & Related papers (2020-06-15T04:58:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.