Related papers: Machine Learning Techniques for Python Source Code Vulnerability Detection

Machine Learning Techniques for Python Source Code Vulnerability Detection

URL: http://arxiv.org/abs/2404.09537v1
Date: Mon, 15 Apr 2024 08:01:02 GMT
Title: Machine Learning Techniques for Python Source Code Vulnerability Detection
Authors: Talaya Farasat, Joachim Posegga,
Abstract summary: We apply and compare different machine learning algorithms for source code vulnerability detection specifically for Python programming language. Our Bidirectional Long Short-Term Memory (BiLSTM) model achieves a remarkable performance.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Software vulnerabilities are a fundamental reason for the prevalence of cyber attacks and their identification is a crucial yet challenging problem in cyber security. In this paper, we apply and compare different machine learning algorithms for source code vulnerability detection specifically for Python programming language. Our experimental evaluation demonstrates that our Bidirectional Long Short-Term Memory (BiLSTM) model achieves a remarkable performance (average Accuracy = 98.6%, average F-Score = 94.7%, average Precision = 96.2%, average Recall = 93.3%, average ROC = 99.3%), thereby, establishing a new benchmark for vulnerability detection in Python source code.

Related papers

EXPLICATE: Enhancing Phishing Detection through Explainable AI and LLM-Powered Interpretability [44.2907457629342]
EXPLICATE is a framework that enhances phishing detection through a three-component architecture. It is on par with existing deep learning techniques but has better explainability. It addresses the critical divide between automated AI and user trust in phishing detection systems.
arXiv Detail & Related papers (2025-03-22T23:37:35Z)
Browser Extension for Fake URL Detection [0.0]
This paper presents a Browser Extension that uses machine learning models to enhance online security. The proposed solution uses LGBM classifier for classification of Phishing websites. The Model for Spam email detection uses Multinomial NB algorithm which has been trained on a dataset with over 5500 messages.
arXiv Detail & Related papers (2024-11-16T07:22:59Z)
Vulnerability Detection in C/C++ Code with Deep Learning [3.105656247358225]
We train neural networks with program slices extracted from the source code of C/C++ programs to detect software vulnerabilities. Our result shows that combining different types of characteristics of source code and using a balanced number of vulnerable program slices and nonvulnerable program slices produce a balanced accuracy.
arXiv Detail & Related papers (2024-05-20T21:39:19Z)
Camouflage is all you need: Evaluating and Enhancing Language Model Robustness Against Camouflage Adversarial Attacks [53.87300498478744]
Adversarial attacks represent a substantial challenge in Natural Language Processing (NLP) This study undertakes a systematic exploration of this challenge in two distinct phases: vulnerability evaluation and resilience enhancement. Results suggest a trade-off between performance and robustness, with some models maintaining similar performance while gaining robustness.
arXiv Detail & Related papers (2024-02-15T10:58:22Z)
Zero-Shot Detection of Machine-Generated Codes [83.0342513054389]
This work proposes a training-free approach for the detection of LLMs-generated codes. We find that existing training-based or zero-shot text detectors are ineffective in detecting code. Our method exhibits robustness against revision attacks and generalizes well to Java codes.
arXiv Detail & Related papers (2023-10-08T10:08:21Z)
VMCDL: Vulnerability Mining Based on Cascaded Deep Learning Under Source Control Flow [2.561778620560749]
This paper mainly use the c/c++ source code data of the SARD dataset, process the source code of CWE476, CWE469, CWE516 and CWE570 vulnerability types. We propose a new cascading deep learning model VMCDL based on source code control flow to effectively detect vulnerabilities.
arXiv Detail & Related papers (2023-03-13T13:58:39Z)
VUDENC: Vulnerability Detection with Deep Learning on a Natural Codebase for Python [8.810543294798485]
VUDENC is a deep learning-based vulnerability detection tool. It learns features of vulnerable code from a large and real-world Python corpus. VUDENC achieves a recall of 78%-87%, a precision of 82%-96%, and an F1 score of 80%-90%.
arXiv Detail & Related papers (2022-01-20T20:29:22Z)
VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code. Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph. VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z)
Security Vulnerability Detection Using Deep Learning Natural Language Processing [1.4591078795663772]
We model software vulnerability detection as a natural language processing (NLP) problem with source code treated as texts. For training and testing, we have built a dataset of over 100,000 files in $C$ programming language with 123 types of vulnerabilities. Experiments generate the best performance of over 93% accuracy in detecting security vulnerabilities.
arXiv Detail & Related papers (2021-05-06T01:28:21Z)
Fast Uncertainty Quantification for Deep Object Pose Estimation [91.09217713805337]
Deep learning-based object pose estimators are often unreliable and overconfident. In this work, we propose a simple, efficient, and plug-and-play UQ method for 6-DoF object pose estimation.
arXiv Detail & Related papers (2020-11-16T06:51:55Z)
Bayesian Optimization with Machine Learning Algorithms Towards Anomaly Detection [66.05992706105224]
In this paper, an effective anomaly detection framework is proposed utilizing Bayesian Optimization technique. The performance of the considered algorithms is evaluated using the ISCX 2012 dataset. Experimental results show the effectiveness of the proposed framework in term of accuracy rate, precision, low-false alarm rate, and recall.
arXiv Detail & Related papers (2020-08-05T19:29:35Z)
Provably Robust Metric Learning [98.50580215125142]
We show that existing metric learning algorithms can result in metrics that are less robust than the Euclidean distance. We propose a novel metric learning algorithm to find a Mahalanobis distance that is robust against adversarial perturbations. Experimental results show that the proposed metric learning algorithm improves both certified robust errors and empirical robust errors.
arXiv Detail & Related papers (2020-06-12T09:17:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.