Deep Learning-based Binary Analysis for Vulnerability Detection in x86-64 Machine Code
- URL: http://arxiv.org/abs/2601.09157v1
- Date: Wed, 14 Jan 2026 04:52:47 GMT
- Title: Deep Learning-based Binary Analysis for Vulnerability Detection in x86-64 Machine Code
- Authors: Mitchell Petingola,
- Abstract summary: This paper explores the feasibility of extracting features directly from raw x86-64 machine code.<n>It shows that machine code may enable more efficient, lightweight models and preserve all information that might be lost in disassembly.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While much of the current research in deep learning-based vulnerability detection relies on disassembled binaries, this paper explores the feasibility of extracting features directly from raw x86-64 machine code. Although assembly language is more interpretable for humans, it requires more complex models to capture token-level context. In contrast, machine code may enable more efficient, lightweight models and preserve all information that might be lost in disassembly. This paper approaches the task of vulnerability detection through an exploratory study on two specific deep learning model architectures and aims to systematically evaluate their performance across three vulnerability types. The results demonstrate that graph-based models consistently outperform sequential models, emphasizing the importance of control flow relationships, and that machine code contains sufficient information for effective vulnerability discovery.
Related papers
- Cross-modal Retrieval Models for Stripped Binary Analysis [62.89251403093734]
BinSeek is the first two-stage cross-modal retrieval framework for stripped binary code analysis.<n>It consists of two models: BinSeekEmbedding is trained on large-scale dataset to learn the semantic relevance of the binary code.<n>BinSeek-Reranker learns to carefully judge the relevance of the candidate code to the description with context augmentation.
arXiv Detail & Related papers (2025-12-11T07:58:10Z) - Ensembling Large Language Models for Code Vulnerability Detection: An Empirical Evaluation [69.8237598448941]
This study investigates the potential of ensemble learning to enhance the performance of Large Language Models (LLMs) in source code vulnerability detection.<n>We propose Dynamic Gated Stacking (DGS), a Stacking variant tailored for vulnerability detection.
arXiv Detail & Related papers (2025-09-16T03:48:22Z) - Code Vulnerability Detection Across Different Programming Languages with AI Models [0.0]
This paper presents the implementations of transformer-based models like CodeBERT and CodeLlama.<n>It shows how off-the-shelf models can successfully produce predictive capacity in models through dynamic fine-tuning of the models on vulnerable and safe code fragments.<n>Experiments show that a well-trained CodeBERT can be as good as or even better than some existing static analyzers in terms of accuracy greater than 97%.
arXiv Detail & Related papers (2025-08-14T05:41:58Z) - Enhancing Software Vulnerability Detection Using Code Property Graphs and Convolutional Neural Networks [0.0]
This paper proposes a novel approach to detecting software vulnerabilities using a combination of code property graphs and machine learning techniques.<n>We introduce various neural network models, including convolutional neural networks adapted for graph data, to process these representations.<n>Our contributions include a methodology for transforming software code into code property graphs, the implementation of a convolutional neural network model for graph data, and the creation of a comprehensive dataset for training and evaluation.
arXiv Detail & Related papers (2025-03-23T19:12:07Z) - EnStack: An Ensemble Stacking Framework of Large Language Models for Enhanced Vulnerability Detection in Source Code [1.9374282535132379]
We introduce EnStack, a novel ensemble stacking framework that enhances vulnerability detection using natural language processing (NLP) techniques.
Our approach synergizes multiple pre-trained large language models (LLMs) specialized in code understanding.
meta-classifiers consolidate the strengths of each LLM, resulting in a comprehensive model that excels in detecting subtle and complex vulnerabilities.
arXiv Detail & Related papers (2024-11-25T16:47:10Z) - Line-level Semantic Structure Learning for Code Vulnerability Detection [44.29771620061153]
We introduce the Code Structure-Aware Network through Line-level Semantic Learning.
It comprises four components: code preprocessing, global semantic awareness, line semantic awareness, and line semantic structure awareness.
The CSLS model outperforms the state-of-the-art baselines in code vulnerability detection, achieving 70.57% accuracy on the Devign dataset and a 49.59% F1 score on the Reveal dataset.
arXiv Detail & Related papers (2024-07-26T17:15:58Z) - Explaining the Contributing Factors for Vulnerability Detection in Machine Learning [16.14514846773874]
There is an increasing trend to mine vulnerabilities from software repositories and use machine learning techniques to automatically detect software vulnerabilities.
This study investigates how the combination of different vulnerability features and three representative machine learning models impact the accuracy of vulnerability detection in 17 real-world projects.
arXiv Detail & Related papers (2024-06-05T18:48:00Z) - DCDetector: An IoT terminal vulnerability mining system based on
distributed deep ensemble learning under source code representation [2.561778620560749]
The goal of the research is to intelligently detect vulnerabilities in source codes of high-level languages such as C/C++.
This enables us to propose a code representation of sensitive sentence-related slices of source code, and to detect vulnerabilities by designing a distributed deep ensemble learning model.
Experiments show that this method can reduce the false positive rate of traditional static analysis and improve the performance and accuracy of machine learning.
arXiv Detail & Related papers (2022-11-29T14:19:14Z) - Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks.
We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z) - Explainable Adversarial Attacks in Deep Neural Networks Using Activation
Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples.
We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z) - Anomaly Detection on Attributed Networks via Contrastive Self-Supervised
Learning [50.24174211654775]
We present a novel contrastive self-supervised learning framework for anomaly detection on attributed networks.
Our framework fully exploits the local information from network data by sampling a novel type of contrastive instance pair.
A graph neural network-based contrastive learning model is proposed to learn informative embedding from high-dimensional attributes and local structure.
arXiv Detail & Related papers (2021-02-27T03:17:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.