Related papers: VUDENC: Vulnerability Detection with Deep Learning on a Natural Codebase for Python

VUDENC: Vulnerability Detection with Deep Learning on a Natural Codebase for Python

URL: http://arxiv.org/abs/2201.08441v1
Date: Thu, 20 Jan 2022 20:29:22 GMT
Title: VUDENC: Vulnerability Detection with Deep Learning on a Natural Codebase for Python
Authors: Laura Wartschinski, Yannic Noller, Thomas Vogel, Timo Kehrer, Lars Grunske
Abstract summary: VUDENC is a deep learning-based vulnerability detection tool. It learns features of vulnerable code from a large and real-world Python corpus. VUDENC achieves a recall of 78%-87%, a precision of 82%-96%, and an F1 score of 80%-90%.
Score: 8.810543294798485
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Context: Identifying potential vulnerable code is important to improve the security of our software systems. However, the manual detection of software vulnerabilities requires expert knowledge and is time-consuming, and must be supported by automated techniques. Objective: Such automated vulnerability detection techniques should achieve a high accuracy, point developers directly to the vulnerable code fragments, scale to real-world software, generalize across the boundaries of a specific software project, and require no or only moderate setup or configuration effort. Method: In this article, we present VUDENC (Vulnerability Detection with Deep Learning on a Natural Codebase), a deep learning-based vulnerability detection tool that automatically learns features of vulnerable code from a large and real-world Python codebase. VUDENC applies a word2vec model to identify semantically similar code tokens and to provide a vector representation. A network of long-short-term memory cells (LSTM) is then used to classify vulnerable code token sequences at a fine-grained level, highlight the specific areas in the source code that are likely to contain vulnerabilities, and provide confidence levels for its predictions. Results: To evaluate VUDENC, we used 1,009 vulnerability-fixing commits from different GitHub repositories that contain seven different types of vulnerabilities (SQL injection, XSS, Command injection, XSRF, Remote code execution, Path disclosure, Open redirect) for training. In the experimental evaluation, VUDENC achieves a recall of 78%-87%, a precision of 82%-96%, and an F1 score of 80%-90%. VUDENC's code, the datasets for the vulnerabilities, and the Python corpus for the word2vec model are available for reproduction. Conclusions: Our experimental results suggest...

Related papers

FuncVul: An Effective Function Level Vulnerability Detection Model using LLM and Code Chunk [8.736988409083981]
Software supply chain vulnerabilities arise when attackers inject vulnerable code into widely used packages or libraries.<n>This paper introduces FuncVul, a code chunk-based model for function-level vulnerability detection in C/C++ and Python.<n>FuncVul identifies multiple vulnerabilities within a function by focusing on smaller, critical code segments.
arXiv Detail & Related papers (2025-06-24T09:30:40Z)
CyberGym: Evaluating AI Agents' Cybersecurity Capabilities with Real-World Vulnerabilities at Scale [46.76144797837242]
Large language model (LLM) agents are becoming increasingly skilled at handling cybersecurity tasks autonomously.<n>Existing benchmarks fall short, often failing to capture real-world scenarios or being limited in scope.<n>We introduce CyberGym, a large-scale and high-quality cybersecurity evaluation framework featuring 1,507 real-world vulnerabilities.
arXiv Detail & Related papers (2025-06-03T07:35:14Z)
RedCode: Risky Code Execution and Generation Benchmark for Code Agents [50.81206098588923]
RedCode is a benchmark for risky code execution and generation. RedCode-Exec provides challenging prompts that could lead to risky code execution. RedCode-Gen provides 160 prompts with function signatures and docstrings as input to assess whether code agents will follow instructions.
arXiv Detail & Related papers (2024-11-12T13:30:06Z)
Vulnerability Detection in C/C++ Code with Deep Learning [3.105656247358225]
We train neural networks with program slices extracted from the source code of C/C++ programs to detect software vulnerabilities. Our result shows that combining different types of characteristics of source code and using a balanced number of vulnerable program slices and nonvulnerable program slices produce a balanced accuracy.
arXiv Detail & Related papers (2024-05-20T21:39:19Z)
Machine Learning Techniques for Python Source Code Vulnerability Detection [0.0]
We apply and compare different machine learning algorithms for source code vulnerability detection specifically for Python programming language. Our Bidirectional Long Short-Term Memory (BiLSTM) model achieves a remarkable performance.
arXiv Detail & Related papers (2024-04-15T08:01:02Z)
The Vulnerability Is in the Details: Locating Fine-grained Information of Vulnerable Code Identified by Graph-based Detectors [33.395068754566935]
VULEXPLAINER is a tool for locating vulnerability-critical code lines from coarse-level vulnerable code snippets. It can flag the vulnerability-triggering code statements with an accuracy of around 90% against eight common C/C++ vulnerabilities.
arXiv Detail & Related papers (2024-01-05T10:15:04Z)
Vulnerability Detection Using Two-Stage Deep Learning Models [0.0]
Two deep learning models were proposed for vulnerability detection in C/C++ source codes. The first stage is CNN which detects if the source code contains any vulnerability. The second stage is CNN-LTSM that classifies this vulnerability into a class of 50 different types of vulnerabilities.
arXiv Detail & Related papers (2023-05-08T22:12:34Z)
CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks. Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities. This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z)
Pre-trained Encoders in Self-Supervised Learning Improve Secure and Privacy-preserving Supervised Learning [63.45532264721498]
Self-supervised learning is an emerging technique to pre-train encoders using unlabeled data. We perform first systematic, principled measurement study to understand whether and when a pretrained encoder can address the limitations of secure or privacy-preserving supervised learning algorithms.
arXiv Detail & Related papers (2022-12-06T21:35:35Z)
Deep-Learning-based Vulnerability Detection in Binary Executables [0.0]
We present a supervised deep learning approach using recurrent neural networks for the application of vulnerability detection based on binary executables. A dataset with 50,651 samples of vulnerable code in the form of a standardized LLVM Intermediate Representation is used. A binary classification was established for detecting the presence of an arbitrary vulnerability, and a multi-class model was trained for the identification of the exact vulnerability.
arXiv Detail & Related papers (2022-11-25T10:33:33Z)
Statement-Level Vulnerability Detection: Learning Vulnerability Patterns Through Information Theory and Contrastive Learning [31.15123852246431]
We propose a novel end-to-end deep learning-based approach to identify the vulnerability-relevant code statements of a specific function. Inspired by the structures observed in real-world vulnerable code, we first leverage mutual information for learning a set of latent variables. We then propose novel clustered spatial contrastive learning in order to further improve the representation learning.
arXiv Detail & Related papers (2022-09-20T00:46:20Z)
VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code. Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph. VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z)
Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora. Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z)
COSEA: Convolutional Code Search with Layer-wise Attention [90.35777733464354]
We propose a new deep learning architecture, COSEA, which leverages convolutional neural networks with layer-wise attention to capture the code's intrinsic structural logic. COSEA can achieve significant improvements over state-of-the-art methods on code search tasks.
arXiv Detail & Related papers (2020-10-19T13:53:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.