Stealthy Backdoor Attack for Code Models
- URL: http://arxiv.org/abs/2301.02496v2
- Date: Tue, 29 Aug 2023 03:28:46 GMT
- Title: Stealthy Backdoor Attack for Code Models
- Authors: Zhou Yang, Bowen Xu, Jie M. Zhang, Hong Jin Kang, Jieke Shi, Junda He,
David Lo
- Abstract summary: Existing backdoor attacks on code models use unstealthy and easy-to-detect triggers.
This paper aims to investigate the vulnerability of code models with stealthy backdoor attacks.
We find that around 85% of adaptive triggers in AFRAIDOOR bypass the detection in the defense process.
- Score: 19.272856932095966
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Code models, such as CodeBERT and CodeT5, offer general-purpose
representations of code and play a vital role in supporting downstream
automated software engineering tasks. Most recently, code models were revealed
to be vulnerable to backdoor attacks. A code model that is backdoor-attacked
can behave normally on clean examples but will produce pre-defined malicious
outputs on examples injected with triggers that activate the backdoors.
Existing backdoor attacks on code models use unstealthy and easy-to-detect
triggers. This paper aims to investigate the vulnerability of code models with
stealthy backdoor attacks. To this end, we propose AFRAIDOOR (Adversarial
Feature as Adaptive Backdoor). AFRAIDOOR achieves stealthiness by leveraging
adversarial perturbations to inject adaptive triggers into different inputs. We
evaluate AFRAIDOOR on three widely adopted code models (CodeBERT, PLBART and
CodeT5) and two downstream tasks (code summarization and method name
prediction). We find that around 85% of adaptive triggers in AFRAIDOOR bypass
the detection in the defense process. By contrast, only less than 12% of the
triggers from previous work bypass the defense. When the defense method is not
applied, both AFRAIDOOR and baselines have almost perfect attack success rates.
However, once a defense is applied, the success rates of baselines decrease
dramatically to 10.47% and 12.06%, while the success rate of AFRAIDOOR are
77.05% and 92.98% on the two tasks. Our finding exposes security weaknesses in
code models under stealthy backdoor attacks and shows that the state-of-the-art
defense method cannot provide sufficient protection. We call for more research
efforts in understanding security threats to code models and developing more
effective countermeasures.
Related papers
- Unlearn to Relearn Backdoors: Deferred Backdoor Functionality Attacks on Deep Learning Models [6.937795040660591]
We introduce Deferred Activated Backdoor Functionality (DABF) as a new paradigm in backdoor attacks.
Unlike conventional attacks, DABF initially conceals its backdoor, producing benign outputs even when triggered.
DABF attacks exploit the common practice in the life cycle of machine learning models to perform model updates and fine-tuning after initial deployment.
arXiv Detail & Related papers (2024-11-10T07:01:53Z) - CodePurify: Defend Backdoor Attacks on Neural Code Models via Entropy-based Purification [19.570958294967536]
backdoor attacks can achieve nearly 100% attack success rates on many software engineering tasks.
We propose CodePurify, a novel defense against backdoor attacks on code models through entropy-based purification.
We extensively evaluate CodePurify against four advanced backdoor attacks across three representative tasks and two popular code models.
arXiv Detail & Related papers (2024-10-26T10:17:50Z) - Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning.
This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities.
In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z) - Diff-Cleanse: Identifying and Mitigating Backdoor Attacks in Diffusion Models [3.134071086568745]
Diffusion models (DMs) are regarded as one of the most advanced generative models today.
Recent studies suggest that DMs are vulnerable to backdoor attacks.
This vulnerability poses substantial risks, including reputational damage to model owners.
We introduce Diff-Cleanse, a novel two-stage backdoor defense framework specifically designed for DMs.
arXiv Detail & Related papers (2024-07-31T03:54:41Z) - TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models [69.37990698561299]
TrojFM is a novel backdoor attack tailored for very large foundation models.
Our approach injects backdoors by fine-tuning only a very small proportion of model parameters.
We demonstrate that TrojFM can launch effective backdoor attacks against widely used large GPT-style models.
arXiv Detail & Related papers (2024-05-27T03:10:57Z) - Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models.
In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned.
We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z) - Not All Prompts Are Secure: A Switchable Backdoor Attack Against Pre-trained Vision Transformers [51.0477382050976]
An extra prompt token, called the switch token in this work, can turn the backdoor mode on, converting a benign model into a backdoored one.
To attack a pre-trained model, our proposed attack, named SWARM, learns a trigger and prompt tokens including a switch token.
Experiments on diverse visual recognition tasks confirm the success of our switchable backdoor attack, achieving 95%+ attack success rate.
arXiv Detail & Related papers (2024-05-17T08:19:48Z) - Dual Model Replacement:invisible Multi-target Backdoor Attack based on Federal Learning [21.600003684064706]
This paper designs a backdoor attack method based on federated learning.
aiming at the concealment of the backdoor trigger, a TrojanGan steganography model with encoder-decoder structure is designed.
A dual model replacement backdoor attack algorithm based on federated learning is designed.
arXiv Detail & Related papers (2024-04-22T07:44:02Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - On Certifying Robustness against Backdoor Attacks via Randomized
Smoothing [74.79764677396773]
We study the feasibility and effectiveness of certifying robustness against backdoor attacks using a recent technique called randomized smoothing.
Our results show the theoretical feasibility of using randomized smoothing to certify robustness against backdoor attacks.
Existing randomized smoothing methods have limited effectiveness at defending against backdoor attacks.
arXiv Detail & Related papers (2020-02-26T19:15:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.