Feature Engineering-Based Detection of Buffer Overflow Vulnerability in
Source Code Using Neural Networks
- URL: http://arxiv.org/abs/2306.07981v1
- Date: Thu, 1 Jun 2023 01:44:49 GMT
- Title: Feature Engineering-Based Detection of Buffer Overflow Vulnerability in
Source Code Using Neural Networks
- Authors: Mst Shapna Akter, Hossain Shahriar, Juan Rodriguez Cardenas, Sheikh
Iqbal Ahamed, and Alfredo Cuzzocrea
- Abstract summary: vulnerability detection method based on neural network models that learn features extracted from source codes.
We maintain the semantic and syntactic information using state of the art word embedding algorithms such as GloVe and fastText.
We have proposed a neural network model that can overcome issues associated with traditional neural networks.
- Score: 2.9266864570485827
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One of the most significant challenges in the field of software code auditing
is the presence of vulnerabilities in software source code. Every year, more
and more software flaws are discovered, either internally in proprietary code
or publicly disclosed. These flaws are highly likely to be exploited and can
lead to system compromise, data leakage, or denial of service. To create a
large-scale machine learning system for function level vulnerability
identification, we utilized a sizable dataset of C and C++ open-source code
containing millions of functions with potential buffer overflow exploits. We
have developed an efficient and scalable vulnerability detection method based
on neural network models that learn features extracted from the source codes.
The source code is first converted into an intermediate representation to
remove unnecessary components and shorten dependencies. We maintain the
semantic and syntactic information using state of the art word embedding
algorithms such as GloVe and fastText. The embedded vectors are subsequently
fed into neural networks such as LSTM, BiLSTM, LSTM Autoencoder, word2vec,
BERT, and GPT2 to classify the possible vulnerabilities. We maintain the
semantic and syntactic information using state of the art word embedding
algorithms such as GloVe and fastText. The embedded vectors are subsequently
fed into neural networks such as LSTM, BiLSTM, LSTM Autoencoder, word2vec,
BERT, and GPT2 to classify the possible vulnerabilities. Furthermore, we have
proposed a neural network model that can overcome issues associated with
traditional neural networks. We have used evaluation metrics such as F1 score,
precision, recall, accuracy, and total execution time to measure the
performance. We have conducted a comparative analysis between results derived
from features containing a minimal text representation and semantic and
syntactic information.
Related papers
- Semantics Alignment via Split Learning for Resilient Multi-User Semantic
Communication [56.54422521327698]
Recent studies on semantic communication rely on neural network (NN) based transceivers such as deep joint source and channel coding (DeepJSCC)
Unlike traditional transceivers, these neural transceivers are trainable using actual source data and channels, enabling them to extract and communicate semantics.
We propose a distributed learning based solution, which leverages split learning (SL) and partial NN fine-tuning techniques.
arXiv Detail & Related papers (2023-10-13T20:29:55Z) - Automated Vulnerability Detection in Source Code Using Quantum Natural
Language Processing [0.0]
C and C++ open source code are now available in order to create a large-scale, classical machine-learning and quantum machine-learning system for function-level vulnerability identification.
We created an efficient and scalable vulnerability detection method based on a deep neural network model Long Short Term Memory (LSTM), and quantum machine learning model Long Short Term Memory (QLSTM)
The QLSTM with semantic and syntactic features detects significantly accurate vulnerability and runs faster than its classical counterpart.
arXiv Detail & Related papers (2023-03-13T23:27:42Z) - DCDetector: An IoT terminal vulnerability mining system based on
distributed deep ensemble learning under source code representation [2.561778620560749]
The goal of the research is to intelligently detect vulnerabilities in source codes of high-level languages such as C/C++.
This enables us to propose a code representation of sensitive sentence-related slices of source code, and to detect vulnerabilities by designing a distributed deep ensemble learning model.
Experiments show that this method can reduce the false positive rate of traditional static analysis and improve the performance and accuracy of machine learning.
arXiv Detail & Related papers (2022-11-29T14:19:14Z) - A Hierarchical Deep Neural Network for Detecting Lines of Codes with
Vulnerabilities [6.09170287691728]
Software vulnerabilities, caused by unintentional flaws in source codes, are the main root cause of cyberattacks.
We propose a deep learning approach to detect vulnerabilities from their LLVM IR representations based on the techniques that have been used in natural language processing.
arXiv Detail & Related papers (2022-11-15T21:21:27Z) - Neuro-Symbolic Artificial Intelligence (AI) for Intent based Semantic
Communication [85.06664206117088]
6G networks must consider semantics and effectiveness (at end-user) of the data transmission.
NeSy AI is proposed as a pillar for learning causal structure behind the observed data.
GFlowNet is leveraged for the first time in a wireless system to learn the probabilistic structure which generates the data.
arXiv Detail & Related papers (2022-05-22T07:11:57Z) - VELVET: a noVel Ensemble Learning approach to automatically locate
VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code.
Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph.
VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - A comparative study of neural network techniques for automatic software
vulnerability detection [9.443081849443184]
Most commonly used method for detecting software vulnerabilities is static analysis.
Some researchers have proposed to use neural networks that have the ability of automatic feature extraction to improve intelligence of detection.
We have conducted extensive experiments to test the performance of the two most typical neural networks.
arXiv Detail & Related papers (2021-04-29T01:47:30Z) - Multi-context Attention Fusion Neural Network for Software Vulnerability
Identification [4.05739885420409]
We propose a deep learning model that learns to detect some of the common categories of security vulnerabilities in source code efficiently.
The model builds an accurate understanding of code semantics with a lot less learnable parameters.
The proposed AI achieves 98.40% F1-score on specific CWEs from the benchmarked NIST SARD dataset.
arXiv Detail & Related papers (2021-04-19T11:50:36Z) - Improved Code Summarization via a Graph Neural Network [96.03715569092523]
In general, source code summarization techniques use the source code as input and outputs a natural language description.
We present an approach that uses a graph-based neural architecture that better matches the default structure of the AST to generate these summaries.
arXiv Detail & Related papers (2020-04-06T17:36:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.