TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models
- URL: http://arxiv.org/abs/2405.16783v1
- Date: Mon, 27 May 2024 03:10:57 GMT
- Title: TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models
- Authors: Yuzhou. Nie, Yanting. Wang, Jinyuan. Jia, Michael J. De Lucia, Nathaniel D. Bastian, Wenbo. Guo, Dawn. Song,
- Abstract summary: TrojFM is a novel backdoor attack tailored for very large foundation models.
Our approach injects backdoors by fine-tuning only a very small proportion of model parameters.
We demonstrate that TrojFM can launch effective backdoor attacks against widely used large GPT-style models.
- Score: 69.37990698561299
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One key challenge in backdoor attacks against large foundation models is the resource limits. Backdoor attacks usually require retraining the target model, which is impractical for very large foundation models. Existing backdoor attacks are mainly designed for supervised classifiers or small foundation models (e.g., BERT). None of these attacks has successfully compromised a very large foundation model, such as Llama-3-70B, especially with limited computational resources. In this paper, we propose TrojFM, a novel backdoor attack tailored for very large foundation models. Our primary technical contribution is the development of a novel backdoor injection method. This method forces a backdoored model to generate similar hidden representations for poisoned inputs regardless of their actual semantics. Our approach injects such backdoors by fine-tuning only a very small proportion of model parameters. This enables TrojFM to efficiently launch downstream task-agnostic backdoor attacks against very large foundation models under limited computational resources. Moreover, we optimize the fine-tuning process with our customized QLoRA technique, enabling launching our attack via only~\textit{one A100 GPU}. Furthermore, we design a new trigger injection method to ensure our attack stealthiness. Through extensive experiments, we first demonstrate that TrojFM can launch effective backdoor attacks against widely used large GPT-style models without jeopardizing their normal functionalities (and outperforming existing attacks on BERT-style models). Furthermore, we show that TrojFM is resilient to SOTA defenses and is insensitive to changes in key hyper-parameters. Finally, we conduct a resource analysis to quantify that our method can significantly save computational and memory costs compared to existing backdoor attacks.
Related papers
- Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models.
In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned.
We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z) - Mudjacking: Patching Backdoor Vulnerabilities in Foundation Models [55.038561766001514]
Foundation models are vulnerable to backdoor attacks and a backdoored foundation model is a single-point-of-failure of the AI ecosystem.
We propose Mudjacking, the first method to patch foundation models to remove backdoors.
Our results show that Mudjacking can remove backdoor from a foundation model while maintaining its utility.
arXiv Detail & Related papers (2024-02-22T21:31:43Z) - Attention-Enhancing Backdoor Attacks Against BERT-based Models [54.070555070629105]
Investigating the strategies of backdoor attacks will help to understand the model's vulnerability.
We propose a novel Trojan Attention Loss (TAL) which enhances the Trojan behavior by directly manipulating the attention patterns.
arXiv Detail & Related papers (2023-10-23T01:24:56Z) - PatchBackdoor: Backdoor Attack against Deep Neural Networks without
Model Modification [0.0]
Backdoor attack is a major threat to deep learning systems in safety-critical scenarios.
In this paper, we show that backdoor attacks can be achieved without any model modification.
We implement PatchBackdoor in real-world scenarios and show that the attack is still threatening.
arXiv Detail & Related papers (2023-08-22T23:02:06Z) - Adversarial Feature Map Pruning for Backdoor [4.550555443103878]
We propose Adversarial Feature Map Pruning for Backdoor (FMP) to mitigate backdoor attacks.
FMP attempts to prune backdoor feature maps, which are trained to extract backdoor information from inputs.
Our experiments demonstrate that, compared to existing defense strategies, FMP can effectively reduce the Attack Success Rate (ASR) even against the most complex and invisible attack triggers.
arXiv Detail & Related papers (2023-07-21T13:17:22Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Stealthy Backdoor Attack for Code Models [19.272856932095966]
Existing backdoor attacks on code models use unstealthy and easy-to-detect triggers.
This paper aims to investigate the vulnerability of code models with stealthy backdoor attacks.
We find that around 85% of adaptive triggers in AFRAIDOOR bypass the detection in the defense process.
arXiv Detail & Related papers (2023-01-06T13:15:42Z) - DECK: Model Hardening for Defending Pervasive Backdoors [21.163501644177668]
Pervasive backdoors are triggered by dynamic and pervasive input perturbations.
We develop a general pervasive attack based on an encoder-decoder architecture enhanced with a special transformation layer.
Our technique can enlarge class distances by 59.65% on average with less than 1% accuracy degradation and no loss.
arXiv Detail & Related papers (2022-06-18T19:46:06Z) - Check Your Other Door! Establishing Backdoor Attacks in the Frequency
Domain [80.24811082454367]
We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks.
We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them.
arXiv Detail & Related papers (2021-09-12T12:44:52Z) - Handcrafted Backdoors in Deep Neural Networks [33.21980707457639]
We introduce a handcrafted attack that directly manipulates the parameters of a pre-trained model to inject backdoors.
Our backdoors remain effective across four datasets and four network architectures with a success rate above 96%.
Our results suggest that further research is needed for understanding the complete space of supply-chain backdoor attacks.
arXiv Detail & Related papers (2021-06-08T20:58:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.