Semantically-Equivalent Transformations-Based Backdoor Attacks against Neural Code Models: Characterization and Mitigation
- URL: http://arxiv.org/abs/2512.19215v1
- Date: Mon, 22 Dec 2025 09:54:52 GMT
- Title: Semantically-Equivalent Transformations-Based Backdoor Attacks against Neural Code Models: Characterization and Mitigation
- Authors: Junyao Ye, Zhen Li, Xi Tang, Shouhuai Xu, Deqing Zou, Zhongsheng Yuan,
- Abstract summary: We introduce a new kind of backdoor attacks, dubbed Semantically-Equivalent Transformation (SET)-based backdoor attacks.<n>We show that SET-based attacks achieve high success rates (often >90%) while preserving model utility.<n>The attack proves highly stealthy, evading state-of-the-art defenses with detection rates on average over 25.13% lower than injection-based counterparts.
- Score: 13.36343806244795
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural code models have been increasingly incorporated into software development processes. However, their susceptibility to backdoor attacks presents a significant security risk. The state-of-the-art understanding focuses on injection-based attacks, which insert anomalous patterns into software code. These attacks can be neutralized by standard sanitization techniques. This status quo may lead to a false sense of security regarding backdoor attacks. In this paper, we introduce a new kind of backdoor attacks, dubbed Semantically-Equivalent Transformation (SET)-based backdoor attacks, which use semantics-preserving low-prevalence code transformations to generate stealthy triggers. We propose a framework to guide the generation of such triggers. Our experiments across five tasks, six languages, and models like CodeBERT, CodeT5, and StarCoder show that SET-based attacks achieve high success rates (often >90%) while preserving model utility. The attack proves highly stealthy, evading state-of-the-art defenses with detection rates on average over 25.13% lower than injection-based counterparts. We evaluate normalization-based countermeasures and find they offer only partial mitigation, confirming the attack's robustness. These results motivate further investigation into scalable defenses tailored to SET-based attacks.
Related papers
- Poison Once, Control Anywhere: Clean-Text Visual Backdoors in VLM-based Mobile Agents [54.35629963816521]
This work introduces VIBMA, the first clean-text backdoor attack targeting VLM-based mobile agents.<n>The attack injects malicious behaviors into the model by modifying only the visual input.<n>We show that our attack achieves high success rates while preserving clean-task behavior.
arXiv Detail & Related papers (2025-06-16T08:09:32Z) - CodePurify: Defend Backdoor Attacks on Neural Code Models via Entropy-based Purification [19.570958294967536]
backdoor attacks can achieve nearly 100% attack success rates on many software engineering tasks.
We propose CodePurify, a novel defense against backdoor attacks on code models through entropy-based purification.
We extensively evaluate CodePurify against four advanced backdoor attacks across three representative tasks and two popular code models.
arXiv Detail & Related papers (2024-10-26T10:17:50Z) - SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z) - Attention-Enhancing Backdoor Attacks Against BERT-based Models [54.070555070629105]
Investigating the strategies of backdoor attacks will help to understand the model's vulnerability.
We propose a novel Trojan Attention Loss (TAL) which enhances the Trojan behavior by directly manipulating the attention patterns.
arXiv Detail & Related papers (2023-10-23T01:24:56Z) - IMBERT: Making BERT Immune to Insertion-based Backdoor Attacks [45.81957796169348]
Backdoor attacks are an insidious security threat against machine learning models.
We introduce IMBERT, which uses either gradients or self-attention scores derived from victim models to self-defend against backdoor attacks.
Our empirical studies demonstrate that IMBERT can effectively identify up to 98.5% of inserted triggers.
arXiv Detail & Related papers (2023-05-25T22:08:57Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - SATBA: An Invisible Backdoor Attack Based On Spatial Attention [7.405457329942725]
Backdoor attacks involve the training of Deep Neural Network (DNN) on datasets that contain hidden trigger patterns.
Most existing backdoor attacks suffer from two significant drawbacks: their trigger patterns are visible and easy to detect by backdoor defense or even human inspection.
We propose a novel backdoor attack named SATBA that overcomes these limitations using spatial attention and an U-net based model.
arXiv Detail & Related papers (2023-02-25T10:57:41Z) - Stealthy Backdoor Attack for Code Models [19.272856932095966]
Existing backdoor attacks on code models use unstealthy and easy-to-detect triggers.
This paper aims to investigate the vulnerability of code models with stealthy backdoor attacks.
We find that around 85% of adaptive triggers in AFRAIDOOR bypass the detection in the defense process.
arXiv Detail & Related papers (2023-01-06T13:15:42Z) - Rethinking the Trigger of Backdoor Attack [83.98031510668619]
Currently, most of existing backdoor attacks adopted the setting of emphstatic trigger, $i.e.,$ triggers across the training and testing images follow the same appearance and are located in the same area.
We demonstrate that such an attack paradigm is vulnerable when the trigger in testing images is not consistent with the one used for training.
arXiv Detail & Related papers (2020-04-09T17:19:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.