Identifying Obfuscated Code through Graph-Based Semantic Analysis of Binary Code
- URL: http://arxiv.org/abs/2504.01481v1
- Date: Wed, 02 Apr 2025 08:36:27 GMT
- Title: Identifying Obfuscated Code through Graph-Based Semantic Analysis of Binary Code
- Authors: Roxane Cohen, Robin David, Florian Yger, Fabrice Rossi,
- Abstract summary: This paper investigates the problem of function-level obfuscation detection using graph-based approaches.<n>We consider various obfuscation types and obfuscators, resulting in two complex datasets.<n>Our approach shows satisfactory results, especially in a challenging 11-class classification task and in a practical malware analysis example.
- Score: 5.181058136007981
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Protecting sensitive program content is a critical issue in various situations, ranging from legitimate use cases to unethical contexts. Obfuscation is one of the most used techniques to ensure such protection. Consequently, attackers must first detect and characterize obfuscation before launching any attack against it. This paper investigates the problem of function-level obfuscation detection using graph-based approaches, comparing algorithms, from elementary baselines to promising techniques like GNN (Graph Neural Networks), on different feature choices. We consider various obfuscation types and obfuscators, resulting in two complex datasets. Our findings demonstrate that GNNs need meaningful features that capture aspects of function semantics to outperform baselines. Our approach shows satisfactory results, especially in a challenging 11-class classification task and in a practical malware analysis example.
Related papers
- The Code Barrier: What LLMs Actually Understand? [7.407441962359689]
This research uses code obfuscation as a structured testing framework to evaluate semantic understanding capabilities of language models.
Findings show a statistically significant performance decline as obfuscation complexity increases.
This research introduces a new evaluation approach for assessing code comprehension in language models.
arXiv Detail & Related papers (2025-04-14T14:11:26Z) - Semantic Entanglement-Based Ransomware Detection via Probabilistic Latent Encryption Mapping [0.0]
Probabilistic Latent Encryption Mapping models encryption behaviors through statistical representations of entropy deviations and probabilistic dependencies in execution traces.<n> Evaluations demonstrate that entropy-driven classification reduces false positive rates while maintaining high detection accuracy across diverse ransomware families and encryption methodologies.<n>The ability to systematically infer encryption-induced deviations without requiring static attack signatures strengthens detection against adversarial evasion techniques.
arXiv Detail & Related papers (2025-02-04T21:27:58Z) - Enhancing Code Vulnerability Detection via Vulnerability-Preserving Data Augmentation [29.72520866016839]
Source code vulnerability detection aims to identify inherent vulnerabilities to safeguard software systems from potential attacks.
Many prior studies overlook diverse vulnerability characteristics, simplifying the problem into a binary (0-1) classification task.
FGVulDet employs multiple classifiers to discern characteristics of various vulnerability types and combines their outputs to identify the specific type of vulnerability.
FGVulDet is trained on a large-scale dataset from GitHub, encompassing five different types of vulnerabilities.
arXiv Detail & Related papers (2024-04-15T09:10:52Z) - DSHGT: Dual-Supervisors Heterogeneous Graph Transformer -- A pioneer study of using heterogeneous graph learning for detecting software vulnerabilities [12.460745260973837]
Vulnerability detection is a critical problem in software security and attracts growing attention both from academia and industry.
Recent advances in deep learning, especially Graph Neural Networks (GNN), have uncovered the feasibility of automatic detection of a wide range of software vulnerabilities.
In this work, we are one of the first to explore heterogeneous graph representation in the form of Code Property Graph.
arXiv Detail & Related papers (2023-06-02T08:57:13Z) - Spatial-Frequency Discriminability for Revealing Adversarial Perturbations [53.279716307171604]
Vulnerability of deep neural networks to adversarial perturbations has been widely perceived in the computer vision community.
Current algorithms typically detect adversarial patterns through discriminative decomposition for natural and adversarial data.
We propose a discriminative detector relying on a spatial-frequency Krawtchouk decomposition.
arXiv Detail & Related papers (2023-05-18T10:18:59Z) - Language Model Decoding as Likelihood-Utility Alignment [54.70547032876017]
We introduce a taxonomy that groups decoding strategies based on their implicit assumptions about how well the model's likelihood is aligned with the task-specific notion of utility.
Specifically, by analyzing the correlation between the likelihood and the utility of predictions across a diverse set of tasks, we provide the first empirical evidence supporting the proposed taxonomy.
arXiv Detail & Related papers (2022-10-13T17:55:51Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z) - MixNet for Generalized Face Presentation Attack Detection [63.35297510471997]
We have proposed a deep learning-based network termed as textitMixNet to detect presentation attacks.
The proposed algorithm utilizes state-of-the-art convolutional neural network architectures and learns the feature mapping for each attack category.
arXiv Detail & Related papers (2020-10-25T23:01:13Z) - Information Obfuscation of Graph Neural Networks [96.8421624921384]
We study the problem of protecting sensitive attributes by information obfuscation when learning with graph structured data.
We propose a framework to locally filter out pre-determined sensitive attributes via adversarial training with the total variation and the Wasserstein distance.
arXiv Detail & Related papers (2020-09-28T17:55:04Z) - A black-box adversarial attack for poisoning clustering [78.19784577498031]
We propose a black-box adversarial attack for crafting adversarial samples to test the robustness of clustering algorithms.
We show that our attacks are transferable even against supervised algorithms such as SVMs, random forests, and neural networks.
arXiv Detail & Related papers (2020-09-09T18:19:31Z) - Graph Backdoor [53.70971502299977]
We present GTA, the first backdoor attack on graph neural networks (GNNs)
GTA departs in significant ways: it defines triggers as specific subgraphs, including both topological structures and descriptive features.
It can be instantiated for both transductive (e.g., node classification) and inductive (e.g., graph classification) tasks.
arXiv Detail & Related papers (2020-06-21T19:45:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.