Applying Graph Analysis for Unsupervised Fast Malware Fingerprinting
- URL: http://arxiv.org/abs/2510.12811v1
- Date: Tue, 07 Oct 2025 05:02:45 GMT
- Title: Applying Graph Analysis for Unsupervised Fast Malware Fingerprinting
- Authors: ElMouatez Billah Karbab, Mourad Debbabi,
- Abstract summary: We propose TrapNet, a novel, scalable, and unsupervised framework for malware fingerprinting and grouping.<n>TrapNet detects packed binaries and unpacks them using known generic packer tools.<n>It generates a digest that captures the underlying semantics.
- Score: 4.558679802785059
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Malware proliferation is increasing at a tremendous rate, with hundreds of thousands of new samples identified daily. Manual investigation of such a vast amount of malware is an unrealistic, time-consuming, and overwhelming task. To cope with this volume, there is a clear need to develop specialized techniques and efficient tools for preliminary filtering that can group malware based on semantic similarity. In this paper, we propose TrapNet, a novel, scalable, and unsupervised framework for malware fingerprinting and grouping. TrapNet employs graph community detection techniques for malware fingerprinting and family attribution based on static analysis, as follows: (1) TrapNet detects packed binaries and unpacks them using known generic packer tools. (2) From each malware sample, it generates a digest that captures the underlying semantics. Since the digest must be dense, efficient, and suitable for similarity checking, we designed FloatHash (FH), a novel numerical fuzzy hashing technique that produces a short real-valued vector summarizing the underlying assembly items and their order. FH is based on applying Principal Component Analysis (PCA) to ordered assembly items (e.g., opcodes, function calls) extracted from the malware's assembly code. (3) Representing malware with short numerical vectors enables high-performance, large-scale similarity computation, which allows TrapNet to build a malware similarity network. (4) Finally, TrapNet employs state-of-the-art community detection algorithms to identify dense communities, which represent groups of malware with similar semantics. Our extensive evaluation of TrapNet demonstrates its effectiveness in terms of the coverage and purity of the detected communities, while also highlighting its runtime efficiency, which outperforms other state-of-the-art solutions.
Related papers
- Multi-Agent Taint Specification Extraction for Vulnerability Detection [49.27772068704498]
Static Application Security Testing (SAST) tools using taint analysis are widely viewed as providing higher-quality vulnerability detection results.<n>We present SemTaint, a multi-agent system that strategically combines the semantic understanding of Large Language Models (LLMs) with traditional static program analysis.<n>We integrate SemTaint with CodeQL, a state-of-the-art SAST tool, and demonstrate its effectiveness by detecting 106 of 162 vulnerabilities previously undetectable by CodeQL.
arXiv Detail & Related papers (2026-01-15T21:31:51Z) - Certifiably robust malware detectors by design [48.367676529300276]
We propose a new model architecture for robust malware detection by design.<n>We show that every robust detector can be decomposed into a specific structure, which can be applied to learn empirically robust malware detectors.<n>Our framework ERDALT is based on this structure.
arXiv Detail & Related papers (2025-08-10T09:19:29Z) - Relation-aware based Siamese Denoising Autoencoder for Malware Few-shot Classification [6.7203034724385935]
When malware employs an unseen zero-day exploit, traditional security measures can fail to detect them.
Existing machine learning methods, which are trained on specific and occasionally outdated malware samples, may struggle to adapt to features in new malware.
We propose a novel Siamese Neural Network (SNN) that uses relation-aware embeddings to calculate more accurate similarity probabilities.
arXiv Detail & Related papers (2024-11-21T11:29:10Z) - Deep Learning Fusion For Effective Malware Detection: Leveraging Visual Features [12.431734971186673]
We investigate the power of fusing Convolutional Neural Network models trained on different modalities of a malware executable.
We are proposing a novel multimodal fusion algorithm, leveraging three different visual malware features.
The proposed strategy has a detection rate of 1.00 (on a scale of 0-1) in identifying malware in the given dataset.
arXiv Detail & Related papers (2024-05-23T08:32:40Z) - An Unforgeable Publicly Verifiable Watermark for Large Language Models [84.2805275589553]
Current watermark detection algorithms require the secret key used in the watermark generation process, making them susceptible to security breaches and counterfeiting during public detection.
We propose an unforgeable publicly verifiable watermark algorithm named UPV that uses two different neural networks for watermark generation and detection, instead of using the same key at both stages.
arXiv Detail & Related papers (2023-07-30T13:43:27Z) - DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified
Robustness [58.23214712926585]
We develop a certified defense, DRSM (De-Randomized Smoothed MalConv), by redesigning the de-randomized smoothing technique for the domain of malware detection.
Specifically, we propose a window ablation scheme to provably limit the impact of adversarial bytes while maximally preserving local structures of the executables.
We are the first to offer certified robustness in the realm of static detection of malware executables.
arXiv Detail & Related papers (2023-03-20T17:25:22Z) - Reliable Malware Analysis and Detection using Topology Data Analysis [12.031113181911627]
Malwares are becoming more complex and they are spreading on networks targeting different infrastructures and personal-end devices.
To defend against malwares, recent work has proposed different techniques based on signatures and machine learning.
arXiv Detail & Related papers (2022-11-03T00:46:52Z) - Flexible Android Malware Detection Model based on Generative Adversarial
Networks with Code Tensor [7.417407987122394]
Existing malware detection methods only target at the existing malicious samples.
In this paper, we propose a novel scheme that detects malware and its variants efficiently.
arXiv Detail & Related papers (2022-10-25T03:20:34Z) - Mate! Are You Really Aware? An Explainability-Guided Testing Framework
for Robustness of Malware Detectors [49.34155921877441]
We propose an explainability-guided and model-agnostic testing framework for robustness of malware detectors.
We then use this framework to test several state-of-the-art malware detectors' abilities to detect manipulated malware.
Our findings shed light on the limitations of current malware detectors, as well as how they can be improved.
arXiv Detail & Related papers (2021-11-19T08:02:38Z) - Towards an Automated Pipeline for Detecting and Classifying Malware
through Machine Learning [0.0]
We propose a malware taxonomic classification pipeline able to classify Windows Portable Executable files (PEs)
Given an input PE sample, it is first classified as either malicious or benign.
If malicious, the pipeline further analyzes it in order to establish its threat type, family, and behavior(s)
arXiv Detail & Related papers (2021-06-10T10:07:50Z) - Adversarial EXEmples: A Survey and Experimental Evaluation of Practical
Attacks on Machine Learning for Windows Malware Detection [67.53296659361598]
adversarial EXEmples can bypass machine learning-based detection by perturbing relatively few input bytes.
We develop a unifying framework that does not only encompass and generalize previous attacks against machine-learning models, but also includes three novel attacks.
These attacks, named Full DOS, Extend and Shift, inject the adversarial payload by respectively manipulating the DOS header, extending it, and shifting the content of the first section.
arXiv Detail & Related papers (2020-08-17T07:16:57Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.