Signature in Code Backdoor Detection, how far are we?
- URL: http://arxiv.org/abs/2510.13992v1
- Date: Wed, 15 Oct 2025 18:18:08 GMT
- Title: Signature in Code Backdoor Detection, how far are we?
- Authors: Quoc Hung Le, Thanh Le-Cong, Bach Le, Bowen Xu,
- Abstract summary: We revisit the applicability of Spectral Signature-based defenses in the context of backdoor attacks on code models.<n>We find that the widely used setting of Spectral Signature in code backdoor detection is often suboptimal.<n>We discover a new proxy metric that can more accurately estimate the actual performance of Spectral Signature without model retraining after the defense.
- Score: 10.384592712399828
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As Large Language Models (LLMs) become increasingly integrated into software development workflows, they also become prime targets for adversarial attacks. Among these, backdoor attacks are a significant threat, allowing attackers to manipulate model outputs through hidden triggers embedded in training data. Detecting such backdoors remains a challenge, and one promising approach is the use of Spectral Signature defense methods that identify poisoned data by analyzing feature representations through eigenvectors. While some prior works have explored Spectral Signatures for backdoor detection in neural networks, recent studies suggest that these methods may not be optimally effective for code models. In this paper, we revisit the applicability of Spectral Signature-based defenses in the context of backdoor attacks on code models. We systematically evaluate their effectiveness under various attack scenarios and defense configurations, analyzing their strengths and limitations. We found that the widely used setting of Spectral Signature in code backdoor detection is often suboptimal. Hence, we explored the impact of different settings of the key factors. We discovered a new proxy metric that can more accurately estimate the actual performance of Spectral Signature without model retraining after the defense.
Related papers
- 3S-Attack: Spatial, Spectral and Semantic Invisible Backdoor Attack Against DNN Models [5.4709581147709985]
We propose a novel backdoor attack, termed 3S-attack, which is stealthy across the spatial, spectral, and semantic domains.<n>The trigger is embedded in the spectral domain, followed by pixel-level restrictions after converting the samples back to the spatial domain.<n>This process minimizes the distance between poisoned and benign samples, making the attack harder to detect by existing defenses and human inspection.
arXiv Detail & Related papers (2025-07-14T18:56:55Z) - DeBackdoor: A Deductive Framework for Detecting Backdoor Attacks on Deep Models with Limited Data [9.119547676789631]
We present a novel framework for detecting backdoors under realistic restrictions.<n>We generate candidate triggers by deductively searching over the space of possible triggers.<n>We conduct extensive evaluation on a wide range of attacks, models, and datasets.
arXiv Detail & Related papers (2025-03-27T09:31:10Z) - Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks [63.269788236474234]
We propose to use model pairs on open-set classification tasks for detecting backdoors.
We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures.
This technique allows for the detection of backdoors on models designed for open-set classification tasks, which is little studied in the literature.
arXiv Detail & Related papers (2024-02-28T21:29:16Z) - Rethinking Backdoor Attacks [122.1008188058615]
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation.
Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them.
We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
arXiv Detail & Related papers (2023-07-19T17:44:54Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - MM-BD: Post-Training Detection of Backdoor Attacks with Arbitrary
Backdoor Pattern Types Using a Maximum Margin Statistic [27.62279831135902]
We propose a post-training defense that detects backdoor attacks with arbitrary types of backdoor embeddings.
Our detector does not need any legitimate clean samples, and can efficiently detect backdoor attacks with arbitrary numbers of source classes.
arXiv Detail & Related papers (2022-05-13T21:32:24Z) - Imperceptible Backdoor Attack: From Input Space to Feature
Representation [24.82632240825927]
Backdoor attacks are rapidly emerging threats to deep neural networks (DNNs)
In this paper, we analyze the drawbacks of existing attack approaches and propose a novel imperceptible backdoor attack.
Our trigger only modifies less than 1% pixels of a benign image while the magnitude is 1.
arXiv Detail & Related papers (2022-05-06T13:02:26Z) - Identifying Backdoor Attacks in Federated Learning via Anomaly Detection [31.197488921578984]
Federated learning is vulnerable to backdoor attacks.
This paper proposes an effective defense against the attack by examining shared model updates.
We demonstrate through extensive analyses that our proposed methods effectively mitigate state-of-the-art backdoor attacks.
arXiv Detail & Related papers (2022-02-09T07:07:42Z) - Check Your Other Door! Establishing Backdoor Attacks in the Frequency
Domain [80.24811082454367]
We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks.
We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them.
arXiv Detail & Related papers (2021-09-12T12:44:52Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.