Related papers: Deep-Learning-based Vulnerability Detection in Binary Executables

Deep-Learning-based Vulnerability Detection in Binary Executables

URL: http://arxiv.org/abs/2212.01254v1
Date: Fri, 25 Nov 2022 10:33:33 GMT
Title: Deep-Learning-based Vulnerability Detection in Binary Executables
Authors: Andreas Schaad, Dominik Binder
Abstract summary: We present a supervised deep learning approach using recurrent neural networks for the application of vulnerability detection based on binary executables. A dataset with 50,651 samples of vulnerable code in the form of a standardized LLVM Intermediate Representation is used. A binary classification was established for detecting the presence of an arbitrary vulnerability, and a multi-class model was trained for the identification of the exact vulnerability.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The identification of vulnerabilities is an important element in the software development life cycle to ensure the security of software. While vulnerability identification based on the source code is a well studied field, the identification of vulnerabilities on basis of a binary executable without the corresponding source code is more challenging. Recent research [1] has shown, how such detection can be achieved by deep learning methods. However, that particular approach is limited to the identification of only 4 types of vulnerabilities. Subsequently, we analyze to what extent we could cover the identification of a larger variety of vulnerabilities. Therefore, a supervised deep learning approach using recurrent neural networks for the application of vulnerability detection based on binary executables is used. The underlying basis is a dataset with 50,651 samples of vulnerable code in the form of a standardized LLVM Intermediate Representation. The vectorised features of a Word2Vec model are used to train different variations of three basic architectures of recurrent neural networks (GRU, LSTM, SRNN). A binary classification was established for detecting the presence of an arbitrary vulnerability, and a multi-class model was trained for the identification of the exact vulnerability, which achieved an out-of-sample accuracy of 88% and 77%, respectively. Differences in the detection of different vulnerabilities were also observed, with non-vulnerable samples being detected with a particularly high precision of over 98%. Thus, the methodology presented allows an accurate detection of 23 (compared to 4 [1]) vulnerabilities.

Related papers

Lie Detector: Unified Backdoor Detection via Cross-Examination Framework [68.45399098884364]
We propose a unified backdoor detection framework in the semi-honest setting. Our method achieves superior detection performance, improving accuracy by 5.4%, 1.6%, and 11.9% over SoTA baselines. Notably, it is the first to effectively detect backdoors in multimodal large language models.
arXiv Detail & Related papers (2025-03-21T06:12:06Z)
VulEval: Towards Repository-Level Evaluation of Software Vulnerability Detection [14.312197590230994]
repository-level evaluation system named textbfVulEval aims at evaluating the detection performance of inter- and intra-procedural vulnerabilities simultaneously. VulEval consists of a large-scale dataset, with a total of 4,196 CVE entries, 232,239 functions, and corresponding 4,699 repository-level source code in C/C++ programming languages.
arXiv Detail & Related papers (2024-04-24T02:16:11Z)
Enhancing Code Vulnerability Detection via Vulnerability-Preserving Data Augmentation [29.72520866016839]
Source code vulnerability detection aims to identify inherent vulnerabilities to safeguard software systems from potential attacks. Many prior studies overlook diverse vulnerability characteristics, simplifying the problem into a binary (0-1) classification task. FGVulDet employs multiple classifiers to discern characteristics of various vulnerability types and combines their outputs to identify the specific type of vulnerability. FGVulDet is trained on a large-scale dataset from GitHub, encompassing five different types of vulnerabilities.
arXiv Detail & Related papers (2024-04-15T09:10:52Z)
Vulnerability Detection with Code Language Models: How Far Are We? [40.455600722638906]
PrimeVul is a new dataset for training and evaluating code LMs for vulnerability detection. It incorporates a novel set of data labeling techniques that achieve comparable label accuracy to human-verified benchmarks. It also implements a rigorous data de-duplication and chronological data splitting strategy to mitigate data leakage issues.
arXiv Detail & Related papers (2024-03-27T14:34:29Z)
SliceLocator: Locating Vulnerable Statements with Graph-based Detectors [33.395068754566935]
SliceLocator identifies the most relevant taint flow by selecting the highest-weighted flow path from all potential vulnerability-triggering statements. We demonstrate that SliceLocator consistently performs well on four state-of-the-art GNN-based vulnerability detectors.
arXiv Detail & Related papers (2024-01-05T10:15:04Z)
Learning to Quantize Vulnerability Patterns and Match to Locate Statement-Level Vulnerabilities [19.6975205650411]
A vulnerability codebook is learned, which consists of quantized vectors representing various vulnerability patterns. During inference, the codebook is iterated to match all learned patterns and predict the presence of potential vulnerabilities. Our approach was extensively evaluated on a real-world dataset comprising more than 188,000 C/C++ functions.
arXiv Detail & Related papers (2023-05-26T04:13:31Z)
VUDENC: Vulnerability Detection with Deep Learning on a Natural Codebase for Python [8.810543294798485]
VUDENC is a deep learning-based vulnerability detection tool. It learns features of vulnerable code from a large and real-world Python corpus. VUDENC achieves a recall of 78%-87%, a precision of 82%-96%, and an F1 score of 80%-90%.
arXiv Detail & Related papers (2022-01-20T20:29:22Z)
VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code. Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph. VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z)
Feature Encoding with AutoEncoders for Weakly-supervised Anomaly Detection [46.76220474310698]
Weakly-supervised anomaly detection aims at learning an anomaly detector from a limited amount of labeled data and abundant unlabeled data. Recent works build deep neural networks for anomaly detection by discriminatively mapping the normal samples and abnormal samples to different regions in the feature space or fitting different distributions. This paper proposes a novel strategy to transform the input data into a more meaningful representation that could be used for anomaly detection.
arXiv Detail & Related papers (2021-05-22T16:23:05Z)
Anomaly Detection in Cybersecurity: Unsupervised, Graph-Based and Supervised Learning Methods in Adversarial Environments [63.942632088208505]
Inherent to today's operating environment is the practice of adversarial machine learning. In this work, we examine the feasibility of unsupervised learning and graph-based methods for anomaly detection. We incorporate a realistic adversarial training mechanism when training our supervised models to enable strong classification performance in adversarial environments.
arXiv Detail & Related papers (2021-05-14T10:05:10Z)
Multi-attentional Deepfake Detection [79.80308897734491]
Face forgery by deepfake is widely spread over the internet and has raised severe societal concerns. We propose a new multi-attentional deepfake detection network. Specifically, it consists of three key components: 1) multiple spatial attention heads to make the network attend to different local parts; 2) textural feature enhancement block to zoom in the subtle artifacts in shallow features; 3) aggregate the low-level textural feature and high-level semantic features guided by the attention maps.
arXiv Detail & Related papers (2021-03-03T13:56:14Z)
Few-shot Network Anomaly Detection via Cross-network Meta-learning [45.8111239825361]
We propose a new family of graph neural networks -- Graph Deviation Networks (GDN) GDN can leverage a small number of labeled anomalies for enforcing statistically significant deviations between abnormal and normal nodes on a network. We equip the proposed GDN with a new cross-network meta-learning algorithm to realize few-shot network anomaly detection.
arXiv Detail & Related papers (2021-02-22T16:42:37Z)
Increasing the Confidence of Deep Neural Networks by Coverage Analysis [71.57324258813674]
This paper presents a lightweight monitoring architecture based on coverage paradigms to enhance the model against different unsafe inputs. Experimental results show that the proposed approach is effective in detecting both powerful adversarial examples and out-of-distribution inputs.
arXiv Detail & Related papers (2021-01-28T16:38:26Z)
BiDet: An Efficient Binarized Object Detector [96.19708396510894]
We propose a binarized neural network learning method called BiDet for efficient object detection. Our BiDet fully utilizes the representational capacity of the binary neural networks for object detection by redundancy removal. Our method outperforms the state-of-the-art binary neural networks by a sizable margin.
arXiv Detail & Related papers (2020-03-09T08:16:16Z)
$\mu$VulDeePecker: A Deep Learning-Based System for Multiclass Vulnerability Detection [24.98991662345816]
We propose the first deep learning-based system for multiclass vulnerability detection, dubbed $mu$VulDeePecker. The key insight underlying $mu$VulDeePecker is the concept of code attention, which can capture information that can help pinpoint types of vulnerabilities. Experiments show that $mu$VulDeePecker is effective for multiclass vulnerability detection and that accommodating control-dependence can lead to higher detection capabilities.
arXiv Detail & Related papers (2020-01-08T01:47:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.