Stack-based Buffer Overflow Detection using Recurrent Neural Networks
- URL: http://arxiv.org/abs/2012.15116v1
- Date: Wed, 30 Dec 2020 11:24:44 GMT
- Title: Stack-based Buffer Overflow Detection using Recurrent Neural Networks
- Authors: William Arild Dahl, Laszlo Erdodi, Fabio Massimo Zennaro
- Abstract summary: We consider the use of modern machine learning models, specifically recurrent neural networks, to detect stack-based buffer overflow vulnerabilities in the assembly code of a program.
Our results show that our architecture is able to capture subtle stack-based buffer overflow vulnerabilities that strongly depend on the context.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Detecting vulnerabilities in software is a critical challenge in the
development and deployment of applications. One of the most known and dangerous
vulnerabilities is stack-based buffer overflows, which may allow potential
attackers to execute malicious code. In this paper we consider the use of
modern machine learning models, specifically recurrent neural networks, to
detect stack-based buffer overflow vulnerabilities in the assembly code of a
program. Since assembly code is a generic and common representation, focusing
on this language allows us to potentially consider programs written in several
different programming languages. Moreover, we subscribe to the hypothesis that
code may be treated as natural language, and thus we process assembly code
using standard architectures commonly employed in natural language processing.
We perform a set of experiments aimed at confirming the validity of the natural
language hypothesis and the feasibility of using recurrent neural networks for
detecting vulnerabilities. Our results show that our architecture is able to
capture subtle stack-based buffer overflow vulnerabilities that strongly depend
on the context, thus suggesting that this approach may be extended to
real-world setting, as well as to other forms of vulnerability detection.
Related papers
- Harnessing the Power of LLMs in Source Code Vulnerability Detection [0.0]
Software vulnerabilities, caused by unintentional flaws in source code, are a primary root cause of cyberattacks.
We harness Large Language Models' capabilities to analyze source code and detect known vulnerabilities.
arXiv Detail & Related papers (2024-08-07T00:48:49Z) - Reverse-Engineering Decoding Strategies Given Blackbox Access to a
Language Generation System [73.52878118434147]
We present methods to reverse-engineer the decoding method used to generate text.
Our ability to discover which decoding strategy was used has implications for detecting generated text.
arXiv Detail & Related papers (2023-09-09T18:19:47Z) - Feature Engineering-Based Detection of Buffer Overflow Vulnerability in
Source Code Using Neural Networks [2.9266864570485827]
vulnerability detection method based on neural network models that learn features extracted from source codes.
We maintain the semantic and syntactic information using state of the art word embedding algorithms such as GloVe and fastText.
We have proposed a neural network model that can overcome issues associated with traditional neural networks.
arXiv Detail & Related papers (2023-06-01T01:44:49Z) - CodeLMSec Benchmark: Systematically Evaluating and Finding Security
Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks.
Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities.
This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z) - On Robustness of Prompt-based Semantic Parsing with Large Pre-trained
Language Model: An Empirical Study on Codex [48.588772371355816]
This paper presents the first empirical study on the adversarial robustness of a large prompt-based language model of code, codex.
Our results demonstrate that the state-of-the-art (SOTA) code-language models are vulnerable to carefully crafted adversarial examples.
arXiv Detail & Related papers (2023-01-30T13:21:00Z) - DCDetector: An IoT terminal vulnerability mining system based on
distributed deep ensemble learning under source code representation [2.561778620560749]
The goal of the research is to intelligently detect vulnerabilities in source codes of high-level languages such as C/C++.
This enables us to propose a code representation of sensitive sentence-related slices of source code, and to detect vulnerabilities by designing a distributed deep ensemble learning model.
Experiments show that this method can reduce the false positive rate of traditional static analysis and improve the performance and accuracy of machine learning.
arXiv Detail & Related papers (2022-11-29T14:19:14Z) - A Hierarchical Deep Neural Network for Detecting Lines of Codes with
Vulnerabilities [6.09170287691728]
Software vulnerabilities, caused by unintentional flaws in source codes, are the main root cause of cyberattacks.
We propose a deep learning approach to detect vulnerabilities from their LLVM IR representations based on the techniques that have been used in natural language processing.
arXiv Detail & Related papers (2022-11-15T21:21:27Z) - VELVET: a noVel Ensemble Learning approach to automatically locate
VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code.
Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph.
VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z) - Improving Compositionality of Neural Networks by Decoding
Representations to Inputs [83.97012077202882]
We bridge the benefits of traditional and deep learning programs by jointly training a generative model to constrain neural network activations to "decode" back to inputs.
We demonstrate applications of decodable representations to out-of-distribution detection, adversarial examples, calibration, and fairness.
arXiv Detail & Related papers (2021-06-01T20:07:16Z) - Multi-context Attention Fusion Neural Network for Software Vulnerability
Identification [4.05739885420409]
We propose a deep learning model that learns to detect some of the common categories of security vulnerabilities in source code efficiently.
The model builds an accurate understanding of code semantics with a lot less learnable parameters.
The proposed AI achieves 98.40% F1-score on specific CWEs from the benchmarked NIST SARD dataset.
arXiv Detail & Related papers (2021-04-19T11:50:36Z) - Contextualized Perturbation for Textual Adversarial Attack [56.370304308573274]
Adversarial examples expose the vulnerabilities of natural language processing (NLP) models.
This paper presents CLARE, a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs.
arXiv Detail & Related papers (2020-09-16T06:53:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.