Related papers: Trojan Horse Training for Breaking Defenses against Backdoor Attacks in Deep Learning

Trojan Horse Training for Breaking Defenses against Backdoor Attacks in Deep Learning

URL: http://arxiv.org/abs/2203.15506v1
Date: Fri, 25 Mar 2022 02:54:27 GMT
Title: Trojan Horse Training for Breaking Defenses against Backdoor Attacks in Deep Learning
Authors: Arezoo Rajabi, Bhaskar Ramasubramanian, Radha Poovendran
Abstract summary: ML models that contain a backdoor are called Trojan models. Current single-target backdoor attacks require one trigger per target class. We introduce a new, more general attack that will enable a single trigger to result in misclassification to more than one target class.
Score: 7.3007220721129364
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine learning (ML) models that use deep neural networks are vulnerable to backdoor attacks. Such attacks involve the insertion of a (hidden) trigger by an adversary. As a consequence, any input that contains the trigger will cause the neural network to misclassify the input to a (single) target class, while classifying other inputs without a trigger correctly. ML models that contain a backdoor are called Trojan models. Backdoors can have severe consequences in safety-critical cyber and cyber physical systems when only the outputs of the model are available. Defense mechanisms have been developed and illustrated to be able to distinguish between outputs from a Trojan model and a non-Trojan model in the case of a single-target backdoor attack with accuracy > 96 percent. Understanding the limitations of a defense mechanism requires the construction of examples where the mechanism fails. Current single-target backdoor attacks require one trigger per target class. We introduce a new, more general attack that will enable a single trigger to result in misclassification to more than one target class. Such a misclassification will depend on the true (actual) class that the input belongs to. We term this category of attacks multi-target backdoor attacks. We demonstrate that a Trojan model with either a single-target or multi-target trigger can be trained so that the accuracy of a defense mechanism that seeks to distinguish between outputs coming from a Trojan and a non-Trojan model will be reduced. Our approach uses the non-Trojan model as a teacher for the Trojan model and solves a min-max optimization problem between the Trojan model and defense mechanism. Empirical evaluations demonstrate that our training procedure reduces the accuracy of a state-of-the-art defense mechanism from >96 to 0 percent.

Related papers

Data Free Backdoor Attacks [83.10379074100453]
DFBA is a retraining-free and data-free backdoor attack without changing the model architecture. We verify that our injected backdoor is provably undetectable and unchosen by various state-of-the-art defenses. Our evaluation on multiple datasets demonstrates that our injected backdoor: 1) incurs negligible classification loss, 2) achieves 100% attack success rates, and 3) bypasses six existing state-of-the-art defenses.
arXiv Detail & Related papers (2024-12-09T05:30:25Z)
TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models [69.37990698561299]
TrojFM is a novel backdoor attack tailored for very large foundation models. Our approach injects backdoors by fine-tuning only a very small proportion of model parameters. We demonstrate that TrojFM can launch effective backdoor attacks against widely used large GPT-style models.
arXiv Detail & Related papers (2024-05-27T03:10:57Z)
Attention-Enhancing Backdoor Attacks Against BERT-based Models [54.070555070629105]
Investigating the strategies of backdoor attacks will help to understand the model's vulnerability. We propose a novel Trojan Attention Loss (TAL) which enhances the Trojan behavior by directly manipulating the attention patterns.
arXiv Detail & Related papers (2023-10-23T01:24:56Z)
Evil from Within: Machine Learning Backdoors through Hardware Trojans [72.99519529521919]
Backdoors pose a serious threat to machine learning, as they can compromise the integrity of security-critical systems, such as self-driving cars. We introduce a backdoor attack that completely resides within a common hardware accelerator for machine learning. We demonstrate the practical feasibility of our attack by implanting our hardware trojan into the Xilinx Vitis AI DPU.
arXiv Detail & Related papers (2023-04-17T16:24:48Z)
FreeEagle: Detecting Complex Neural Trojans in Data-Free Cases [50.065022493142116]
Trojan attack on deep neural networks, also known as backdoor attack, is a typical threat to artificial intelligence. FreeEagle is the first data-free backdoor detection method that can effectively detect complex backdoor attacks.
arXiv Detail & Related papers (2023-02-28T11:31:29Z)
BEAGLE: Forensics of Deep Learning Backdoor Attack for Better Defense [26.314275611787984]
Attack forensics is a critical counter-measure for traditional cyber attacks. Deep Learning backdoor attacks have a threat model similar to traditional cyber attacks. We propose a novel model backdoor forensics technique.
arXiv Detail & Related papers (2023-01-16T02:59:40Z)
Marksman Backdoor: Backdoor Attacks with Arbitrary Target Class [17.391987602738606]
In recent years, machine learning models have been shown to be vulnerable to backdoor attacks. This paper exploits a novel backdoor attack with a much more powerful payload, denoted as Marksman. We show empirically that the proposed framework achieves high attack performance while preserving the clean-data performance in several benchmark datasets.
arXiv Detail & Related papers (2022-10-17T15:46:57Z)
An Adaptive Black-box Defense against Trojan Attacks (TrojDef) [5.880596125802611]
Trojan backdoor is a poisoning attack against Neural Network (NN) classifiers. We propose a more practical black-box defense, dubbed TrojDef, which can only run forward-pass of the NN. TrojDef significantly outperforms the-state-of-the-art defenses and is highly stable under different settings.
arXiv Detail & Related papers (2022-09-05T01:54:44Z)
MM-BD: Post-Training Detection of Backdoor Attacks with Arbitrary Backdoor Pattern Types Using a Maximum Margin Statistic [27.62279831135902]
We propose a post-training defense that detects backdoor attacks with arbitrary types of backdoor embeddings. Our detector does not need any legitimate clean samples, and can efficiently detect backdoor attacks with arbitrary numbers of source classes.
arXiv Detail & Related papers (2022-05-13T21:32:24Z)
Odyssey: Creation, Analysis and Detection of Trojan Models [91.13959405645959]
Trojan attacks interfere with the training pipeline by inserting triggers into some of the training samples and trains the model to act maliciously only for samples that contain the trigger. Existing Trojan detectors make strong assumptions about the types of triggers and attacks. We propose a detector that is based on the analysis of the intrinsic properties; that are affected due to the Trojaning process.
arXiv Detail & Related papers (2020-07-16T06:55:00Z)
An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks [59.42357806777537]
trojan attack aims to attack deployed deep neural networks (DNNs) relying on hidden trigger patterns inserted by hackers. We propose a training-free attack approach which is different from previous work, in which trojaned behaviors are injected by retraining model on a poisoned dataset. The proposed TrojanNet has several nice properties including (1) it activates by tiny trigger patterns and keeps silent for other signals, (2) it is model-agnostic and could be injected into most DNNs, dramatically expanding its attack scenarios, and (3) the training-free mechanism saves massive training efforts compared to conventional trojan attack methods.
arXiv Detail & Related papers (2020-06-15T04:58:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.