Poisoned Source Code Detection in Code Models
- URL: http://arxiv.org/abs/2502.13459v1
- Date: Wed, 19 Feb 2025 06:16:07 GMT
- Title: Poisoned Source Code Detection in Code Models
- Authors: Ehab Ghannoum, Mohammad Ghafari,
- Abstract summary: We introduce CodeGarrison (CG), a hybrid deep-learning model that relies on code embeddings to identify poisoned code samples.
Results showed that CG significantly outperformed ONION with an accuracy of 93.5%.
We also tested CG's robustness against unknown attacks and achieved an average accuracy of 85.6%.
- Score: 0.09208007322096533
- License:
- Abstract: Deep learning models have gained popularity for conducting various tasks involving source code. However, their black-box nature raises concerns about potential risks. One such risk is a poisoning attack, where an attacker intentionally contaminates the training set with malicious samples to mislead the model's predictions in specific scenarios. To protect source code models from poisoning attacks, we introduce CodeGarrison (CG), a hybrid deep-learning model that relies on code embeddings to identify poisoned code samples. We evaluated CG against the state-of-the-art technique ONION for detecting poisoned samples generated by DAMP, MHM, ALERT, as well as a novel poisoning technique named CodeFooler. Results showed that CG significantly outperformed ONION with an accuracy of 93.5%. We also tested CG's robustness against unknown attacks and achieved an average accuracy of 85.6% in identifying poisoned samples across the four attacks mentioned above.
Related papers
- Unlearnable Examples Detection via Iterative Filtering [84.59070204221366]
Deep neural networks are proven to be vulnerable to data poisoning attacks.
It is quite beneficial and challenging to detect poisoned samples from a mixed dataset.
We propose an Iterative Filtering approach for UEs identification.
arXiv Detail & Related papers (2024-08-15T13:26:13Z) - Protecting Model Adaptation from Trojans in the Unlabeled Data [120.42853706967188]
This paper explores the potential trojan attacks on model adaptation launched by well-designed poisoning target data.
We propose a plug-and-play method named DiffAdapt, which can be seamlessly integrated with existing adaptation algorithms.
arXiv Detail & Related papers (2024-01-11T16:42:10Z) - Vulnerabilities in AI Code Generators: Exploring Targeted Data Poisoning
Attacks [9.386731514208149]
This work investigates the security of AI code generators by devising a targeted data poisoning strategy.
We poison the training data by injecting increasing amounts of code containing security vulnerabilities.
Our study shows that AI code generators are vulnerable to even a small amount of poison.
arXiv Detail & Related papers (2023-08-04T15:23:30Z) - Exploring Model Dynamics for Accumulative Poisoning Discovery [62.08553134316483]
We propose a novel information measure, namely, Memorization Discrepancy, to explore the defense via the model-level information.
By implicitly transferring the changes in the data manipulation to that in the model outputs, Memorization Discrepancy can discover the imperceptible poison samples.
We thoroughly explore its properties and propose Discrepancy-aware Sample Correction (DSC) to defend against accumulative poisoning attacks.
arXiv Detail & Related papers (2023-06-06T14:45:24Z) - BadCS: A Backdoor Attack Framework for Code search [28.33043896763264]
We propose a novel Backdoor attack framework for Code Search models, named BadCS.
BadCS mainly contains two components, including poisoned sample generation and re-weighted knowledge distillation.
Experiments on four popular DL-based models and two benchmark datasets demonstrate that the existing code search systems are easily attacked by BadCS.
arXiv Detail & Related papers (2023-05-09T14:52:38Z) - Poison Attack and Defense on Deep Source Code Processing Models [38.32413592143839]
We present a poison attack framework for source code named CodePoisoner as a strong imaginary enemy.
CodePoisoner can produce compilable even human-imperceptible poison samples and attack models by poisoning the training data.
We propose an effective defense approach named CodeDetector to detect poison samples in the training data.
arXiv Detail & Related papers (2022-10-31T03:06:40Z) - DeepPoison: Feature Transfer Based Stealthy Poisoning Attack [2.1445455835823624]
DeepPoison is a novel adversarial network of one generator and two discriminators.
DeepPoison can achieve a state-of-the-art attack success rate, as high as 91.74%.
arXiv Detail & Related papers (2021-01-06T15:45:36Z) - How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z) - Poison Attacks against Text Datasets with Conditional Adversarially
Regularized Autoencoder [78.01180944665089]
This paper demonstrates a fatal vulnerability in natural language inference (NLI) and text classification systems.
We present a 'backdoor poisoning' attack on NLP models.
arXiv Detail & Related papers (2020-10-06T13:03:49Z) - Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching [56.280018325419896]
Data Poisoning attacks modify training data to maliciously control a model trained on such data.
We analyze a particularly malicious poisoning attack that is both "from scratch" and "clean label"
We show that it is the first poisoning method to cause targeted misclassification in modern deep networks trained from scratch on a full-sized, poisoned ImageNet dataset.
arXiv Detail & Related papers (2020-09-04T16:17:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.