Related papers: Transferable Backdoor Attacks for Code Models via Sharpness-Aware Adversarial Perturbation

Transferable Backdoor Attacks for Code Models via Sharpness-Aware Adversarial Perturbation

URL: http://arxiv.org/abs/2602.11213v1
Date: Wed, 11 Feb 2026 08:26:47 GMT
Title: Transferable Backdoor Attacks for Code Models via Sharpness-Aware Adversarial Perturbation
Authors: Shuyu Chang, Haiping Huang, Yanjun Zhang, Yujin Huang, Fu Xiao, Leo Yu Zhang,
Abstract summary: Existing backdoor attacks on code models face a fundamental trade-off between transferability and stealthiness.<n>We propose Sharpness-aware Transferable Adversarial Backdoor (STAB)<n>STAB achieves both transferability and stealthiness without requiring complete victim data.
Score: 37.091275561451695
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Code models are increasingly adopted in software development but remain vulnerable to backdoor attacks via poisoned training data. Existing backdoor attacks on code models face a fundamental trade-off between transferability and stealthiness. Static trigger-based attacks insert fixed dead code patterns that transfer well across models and datasets but are easily detected by code-specific defenses. In contrast, dynamic trigger-based attacks adaptively generate context-aware triggers to evade detection but suffer from poor cross-dataset transferability. Moreover, they rely on unrealistic assumptions of identical data distributions between poisoned and victim training data, limiting their practicality. To overcome these limitations, we propose Sharpness-aware Transferable Adversarial Backdoor (STAB), a novel attack that achieves both transferability and stealthiness without requiring complete victim data. STAB is motivated by the observation that adversarial perturbations in flat regions of the loss landscape transfer more effectively across datasets than those in sharp minima. To this end, we train a surrogate model using Sharpness-Aware Minimization to guide model parameters toward flat loss regions, and employ Gumbel-Softmax optimization to enable differentiable search over discrete trigger tokens for generating context-aware adversarial triggers. Experiments across three datasets and two code models show that STAB outperforms prior attacks in terms of transferability and stealthiness. It achieves a 73.2% average attack success rate after defense, outperforming static trigger-based attacks that fail under defense. STAB also surpasses the best dynamic trigger-based attack by 12.4% in cross-dataset attack success rate and maintains performance on clean inputs.

Related papers

FAROS: Robust Federated Learning with Adaptive Scaling against Backdoor Attacks [9.466036066320946]
backdoor attacks pose a significant threat to Federated Learning (FL)<n>We propose FAROS, an enhanced FL framework that incorporates Adaptive Differential Scaling (ADS) and Robust Core-set Computing (RCC)<n>RCC effectively mitigates the risk of single-point failure by computing the centroid of a core set comprising clients with the highest confidence.
arXiv Detail & Related papers (2026-01-05T06:55:35Z)
Semantically-Equivalent Transformations-Based Backdoor Attacks against Neural Code Models: Characterization and Mitigation [13.36343806244795]
We introduce a new kind of backdoor attacks, dubbed Semantically-Equivalent Transformation (SET)-based backdoor attacks.<n>We show that SET-based attacks achieve high success rates (often >90%) while preserving model utility.<n>The attack proves highly stealthy, evading state-of-the-art defenses with detection rates on average over 25.13% lower than injection-based counterparts.
arXiv Detail & Related papers (2025-12-22T09:54:52Z)
Poison Once, Control Anywhere: Clean-Text Visual Backdoors in VLM-based Mobile Agents [54.35629963816521]
This work introduces VIBMA, the first clean-text backdoor attack targeting VLM-based mobile agents.<n>The attack injects malicious behaviors into the model by modifying only the visual input.<n>We show that our attack achieves high success rates while preserving clean-task behavior.
arXiv Detail & Related papers (2025-06-16T08:09:32Z)
InverTune: Removing Backdoors from Multimodal Contrastive Learning Models via Trigger Inversion and Activation Tuning [36.56302680556252]
We introduce InverTune, the first backdoor defense framework for multimodal models under minimal attacker assumptions.<n>InverTune effectively identifies and removes backdoor artifacts through three key components, achieving robust protection against backdoor attacks.<n> Experimental results show that InverTune reduces the average attack success rate (ASR) by 97.87% against the state-of-the-art (SOTA) attacks.
arXiv Detail & Related papers (2025-06-14T09:08:34Z)
Variance-Based Defense Against Blended Backdoor Attacks [0.0]
Backdoor attacks represent a subtle yet effective class of cyberattacks targeting AI models.<n>We propose a novel defense method that trains a model on the given dataset, detects poisoned classes, and extracts the critical part of the attack trigger.
arXiv Detail & Related papers (2025-06-02T09:01:35Z)
Towards Model Resistant to Transferable Adversarial Examples via Trigger Activation [95.3977252782181]
Adversarial examples, characterized by imperceptible perturbations, pose significant threats to deep neural networks by misleading their predictions.<n>We introduce a novel training paradigm aimed at enhancing robustness against transferable adversarial examples (TAEs) in a more efficient and effective way.
arXiv Detail & Related papers (2025-04-20T09:07:10Z)
A Practical Trigger-Free Backdoor Attack on Neural Networks [33.426207982772226]
We propose a trigger-free backdoor attack that does not require access to any training data. Specifically, we design a novel fine-tuning approach that incorporates the concept of malicious data into the concept of the attacker-specified class. The effectiveness, practicality, and stealthiness of the proposed attack are evaluated on three real-world datasets.
arXiv Detail & Related papers (2024-08-21T08:53:36Z)
Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers [95.22517830759193]
This paper studies the transferability of such an adversarial vulnerability from a pre-trained ViT model to downstream tasks. We show that DTA achieves an average attack success rate (ASR) exceeding 90%, surpassing existing methods by a huge margin.
arXiv Detail & Related papers (2024-08-03T08:07:03Z)
Transferable Attack for Semantic Segmentation [59.17710830038692]
adversarial attacks, and observe that the adversarial examples generated from a source model fail to attack the target models. We propose an ensemble attack for semantic segmentation to achieve more effective attacks with higher transferability.
arXiv Detail & Related papers (2023-07-31T11:05:55Z)
Avoid Adversarial Adaption in Federated Learning by Multi-Metric Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources. FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks. We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously. MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z)
Boosting Black-Box Attack with Partially Transferred Conditional Adversarial Distribution [83.02632136860976]
We study black-box adversarial attacks against deep neural networks (DNNs) We develop a novel mechanism of adversarial transferability, which is robust to the surrogate biases. Experiments on benchmark datasets and attacking against real-world API demonstrate the superior attack performance of the proposed method.
arXiv Detail & Related papers (2020-06-15T16:45:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.