Related papers: VMCDL: Vulnerability Mining Based on Cascaded Deep Learning Under Source Control Flow

VMCDL: Vulnerability Mining Based on Cascaded Deep Learning Under Source Control Flow

URL: http://arxiv.org/abs/2303.07128v2
Date: Fri, 24 Mar 2023 14:33:30 GMT
Title: VMCDL: Vulnerability Mining Based on Cascaded Deep Learning Under Source Control Flow
Authors: Wen Zhou
Abstract summary: This paper mainly use the c/c++ source code data of the SARD dataset, process the source code of CWE476, CWE469, CWE516 and CWE570 vulnerability types. We propose a new cascading deep learning model VMCDL based on source code control flow to effectively detect vulnerabilities.
Score: 2.561778620560749
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the rapid development of the computer industry and computer software, the risk of software vulnerabilities being exploited has greatly increased. However, there are still many shortcomings in the existing mining techniques for leakage source research, such as high false alarm rate, coarse-grained detection, and dependence on expert experience. In this paper, we mainly use the c/c++ source code data of the SARD dataset, process the source code of CWE476, CWE469, CWE516 and CWE570 vulnerability types, test the Joern vulnerability scanning function of the cutting-edge tool, and propose a new cascading deep learning model VMCDL based on source code control flow to effectively detect vulnerabilities. First, this paper uses joern to locate and extract sensitive functions and statements to form a sensitive statement library of vulnerable code. Then, the CFG flow vulnerability code snippets are generated by bidirectional breadth-first traversal, and then vectorized by Doc2vec. Finally, the cascade deep learning model based on source code control flow is used for classification to obtain the classification results. In the experimental evaluation, we give the test results of Joern on specific vulnerabilities, and give the confusion matrix and label data of the binary classification results of the model algorithm on single vulnerability type source code, and compare and verify the five indicators of FPR, FNR, ACC, P and F1, respectively reaching 10.30%, 5.20%, 92.50%,85.10% and 85.40%,which shows that it can effectively reduce the false alarm rate of static analysis.

Related papers

VulCoCo: A Simple Yet Effective Method for Detecting Vulnerable Code Clones [11.650715913321076]
VulCoCo is a lightweight and scalable approach to detect vulnerable code clones.<n>We first construct a synthetic benchmark that spans various clone types.<n>Our experiments show that VulCoCoCo outperforms prior state-of-the-art methods in terms of Precision@k and mean average precision (MAP)
arXiv Detail & Related papers (2025-07-22T14:54:57Z)
Cryptanalysis via Machine Learning Based Information Theoretic Metrics [58.96805474751668]
We propose two novel applications of machine learning (ML) algorithms to perform cryptanalysis on any cryptosystem. These algorithms can be readily applied in an audit setting to evaluate the robustness of a cryptosystem. We show that our classification model correctly identifies the encryption schemes that are not IND-CPA secure, such as DES, RSA, and AES ECB, with high accuracy.
arXiv Detail & Related papers (2025-01-25T04:53:36Z)
Deep Autoencoders for Unsupervised Anomaly Detection in Wildfire Prediction [42.447827727628734]
Wildfires pose a significantly increasing hazard to global ecosystems due to the climate crisis. Due to its complex nature, there is an urgent need for innovative approaches to wildfire prediction, such as machine learning. This research took a unique approach, differentiating from classical supervised learning, and addressed the gap in unsupervised wildfire prediction.
arXiv Detail & Related papers (2024-11-14T23:19:55Z)
Vulnerability Detection with Code Language Models: How Far Are We? [40.455600722638906]
PrimeVul is a new dataset for training and evaluating code LMs for vulnerability detection. It incorporates a novel set of data labeling techniques that achieve comparable label accuracy to human-verified benchmarks. It also implements a rigorous data de-duplication and chronological data splitting strategy to mitigate data leakage issues.
arXiv Detail & Related papers (2024-03-27T14:34:29Z)
FoC: Figure out the Cryptographic Functions in Stripped Binaries with LLMs [54.27040631527217]
We propose a novel framework called FoC to Figure out the Cryptographic functions in stripped binaries. We first build a binary large language model (FoC-BinLLM) to summarize the semantics of cryptographic functions in natural language. We then build a binary code similarity model (FoC-Sim) upon the FoC-BinLLM to create change-sensitive representations and use it to retrieve similar implementations of unknown cryptographic functions in a database.
arXiv Detail & Related papers (2024-03-27T09:45:33Z)
The Vulnerability Is in the Details: Locating Fine-grained Information of Vulnerable Code Identified by Graph-based Detectors [33.395068754566935]
VULEXPLAINER is a tool for locating vulnerability-critical code lines from coarse-level vulnerable code snippets. It can flag the vulnerability-triggering code statements with an accuracy of around 90% against eight common C/C++ vulnerabilities.
arXiv Detail & Related papers (2024-01-05T10:15:04Z)
Zero-Shot Detection of Machine-Generated Codes [83.0342513054389]
This work proposes a training-free approach for the detection of LLMs-generated codes. We find that existing training-based or zero-shot text detectors are ineffective in detecting code. Our method exhibits robustness against revision attacks and generalizes well to Java codes.
arXiv Detail & Related papers (2023-10-08T10:08:21Z)
An Unbiased Transformer Source Code Learning with Semantic Vulnerability Graph [3.3598755777055374]
Current vulnerability screening techniques are ineffective at identifying novel vulnerabilities or providing developers with code vulnerability and classification. To address these issues, we propose a joint multitasked unbiased vulnerability classifier comprising a transformer "RoBERTa" and graph convolution neural network (GCN) We present a training process utilizing a semantic vulnerability graph (SVG) representation from source code, created by integrating edges from a sequential flow, control flow, and data flow, as well as a novel flow dubbed Poacher Flow (PF)
arXiv Detail & Related papers (2023-04-17T20:54:14Z)
CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks. Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities. This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z)
DCDetector: An IoT terminal vulnerability mining system based on distributed deep ensemble learning under source code representation [2.561778620560749]
The goal of the research is to intelligently detect vulnerabilities in source codes of high-level languages such as C/C++. This enables us to propose a code representation of sensitive sentence-related slices of source code, and to detect vulnerabilities by designing a distributed deep ensemble learning model. Experiments show that this method can reduce the false positive rate of traditional static analysis and improve the performance and accuracy of machine learning.
arXiv Detail & Related papers (2022-11-29T14:19:14Z)
A Hierarchical Deep Neural Network for Detecting Lines of Codes with Vulnerabilities [6.09170287691728]
Software vulnerabilities, caused by unintentional flaws in source codes, are the main root cause of cyberattacks. We propose a deep learning approach to detect vulnerabilities from their LLVM IR representations based on the techniques that have been used in natural language processing.
arXiv Detail & Related papers (2022-11-15T21:21:27Z)
VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code. Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph. VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z)
Multi-context Attention Fusion Neural Network for Software Vulnerability Identification [4.05739885420409]
We propose a deep learning model that learns to detect some of the common categories of security vulnerabilities in source code efficiently. The model builds an accurate understanding of code semantics with a lot less learnable parameters. The proposed AI achieves 98.40% F1-score on specific CWEs from the benchmarked NIST SARD dataset.
arXiv Detail & Related papers (2021-04-19T11:50:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.