Clustering based opcode graph generation for malware variant detection
- URL: http://arxiv.org/abs/2211.10048v1
- Date: Fri, 18 Nov 2022 06:12:33 GMT
- Title: Clustering based opcode graph generation for malware variant detection
- Authors: Kar Wai Fok, Vrizlynn L. L. Thing
- Abstract summary: We propose a methodology to perform malware detection and family attribution.
The proposed methodology first performs the extraction of opcodes from malwares in each family and constructs their respective opcode graphs.
We explore the use of clustering algorithms on the opcode graphs to detect clusters of malwares within the same malware family.
- Score: 1.179179628317559
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Malwares are the key means leveraged by threat actors in the cyber space for
their attacks. There is a large array of commercial solutions in the market and
significant scientific research to tackle the challenge of the detection and
defense against malwares. At the same time, attackers also advance their
capabilities in creating polymorphic and metamorphic malwares to make it
increasingly challenging for existing solutions. To tackle this issue, we
propose a methodology to perform malware detection and family attribution. The
proposed methodology first performs the extraction of opcodes from malwares in
each family and constructs their respective opcode graphs. We explore the use
of clustering algorithms on the opcode graphs to detect clusters of malwares
within the same malware family. Such clusters can be seen as belonging to
different sub-family groups. Opcode graph signatures are built from each
detected cluster. Hence, for each malware family, a group of signatures is
generated to represent the family. These signatures are used to classify an
unknown sample as benign or belonging to one the malware families. We evaluate
our methodology by performing experiments on a dataset consisting of both
benign files and malware samples belonging to a number of different malware
families and comparing the results to existing approach.
Related papers
- MASKDROID: Robust Android Malware Detection with Masked Graph Representations [56.09270390096083]
We propose MASKDROID, a powerful detector with a strong discriminative ability to identify malware.
We introduce a masking mechanism into the Graph Neural Network based framework, forcing MASKDROID to recover the whole input graph.
This strategy enables the model to understand the malicious semantics and learn more stable representations, enhancing its robustness against adversarial attacks.
arXiv Detail & Related papers (2024-09-29T07:22:47Z) - Understanding crypter-as-a-service in a popular underground marketplace [51.328567400947435]
Crypters are pieces of software whose main goal is to transform a target binary so it can avoid detection from Anti Viruses (AVs) applications.
The crypter-as-a-service model has gained popularity, in response to the increased sophistication of detection mechanisms.
This paper provides the first study on an online underground market dedicated to crypter-as-a-service.
arXiv Detail & Related papers (2024-05-20T08:35:39Z) - Online Clustering of Known and Emerging Malware Families [1.2289361708127875]
It is essential to categorize malware samples according to their malicious characteristics.
Online clustering algorithms help us to understand malware behavior and produce a quicker response to new threats.
This paper introduces a novel machine learning-based model for the online clustering of malicious samples into malware families.
arXiv Detail & Related papers (2024-05-06T09:20:17Z) - EMBERSim: A Large-Scale Databank for Boosting Similarity Search in
Malware Analysis [48.5877840394508]
In recent years there has been a shift from quantifications-based malware detection towards machine learning.
We propose to address the deficiencies in the space of similarity research on binary files, starting from EMBER.
We enhance EMBER with similarity information as well as malware class tags, to enable further research in the similarity space.
arXiv Detail & Related papers (2023-10-03T06:58:45Z) - DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified
Robustness [58.23214712926585]
We develop a certified defense, DRSM (De-Randomized Smoothed MalConv), by redesigning the de-randomized smoothing technique for the domain of malware detection.
Specifically, we propose a window ablation scheme to provably limit the impact of adversarial bytes while maximally preserving local structures of the executables.
We are the first to offer certified robustness in the realm of static detection of malware executables.
arXiv Detail & Related papers (2023-03-20T17:25:22Z) - From Malware Samples to Fractal Images: A New Paradigm for
Classification. (Version 2.0, Previous version paper name: Have you ever seen
malware?) [0.3670422696827526]
We propose a very unconventional and novel approach to malware visualisation based on dynamic behaviour analysis.
The idea is that the images, which are visually very interesting, are then used to classify malware concerning goodware.
The results of the presented experiments are based on a database of 6 589 997 goodware, 827 853 potentially unwanted applications and 4 174 203 malware samples.
arXiv Detail & Related papers (2022-12-05T15:15:54Z) - Design of secure and robust cognitive system for malware detection [0.571097144710995]
Adversarial samples are generated by intelligently crafting and adding perturbations to the input samples.
The aim of this thesis is to address the critical system security issues.
A novel technique to detect stealthy malware is proposed.
arXiv Detail & Related papers (2022-08-03T18:52:38Z) - New Datasets for Dynamic Malware Classification [0.0]
We introduce two new, updated datasets of malicious software, VirusSamples and VirusShare.
This paper analyzes multi-class malware classification performance of the balanced and imbalanced version of these two datasets.
Results show that Support Vector Machine, achieves the highest score of 94% in the imbalanced VirusSample dataset.
XGBoost, one of the most common gradient boosting-based models, achieves the highest score of 90% and 80%.in both versions of the VirusShare dataset.
arXiv Detail & Related papers (2021-11-30T08:31:16Z) - Being Single Has Benefits. Instance Poisoning to Deceive Malware
Classifiers [47.828297621738265]
We show how an attacker can launch a sophisticated and efficient poisoning attack targeting the dataset used to train a malware classifier.
As opposed to other poisoning attacks in the malware detection domain, our attack does not focus on malware families but rather on specific malware instances that contain an implanted trigger.
We propose a comprehensive detection approach that could serve as a future sophisticated defense against this newly discovered severe threat.
arXiv Detail & Related papers (2020-10-30T15:27:44Z) - DAEMON: Dataset-Agnostic Explainable Malware Classification Using
Multi-Stage Feature Mining [3.04585143845864]
Malware classification is the task of determining to which family a new malicious variant belongs.
We present DAEMON, a novel dataset-agnostic malware classification tool.
arXiv Detail & Related papers (2020-08-04T21:57:30Z) - Adversarial Attack on Community Detection by Hiding Individuals [68.76889102470203]
We focus on black-box attack and aim to hide targeted individuals from the detection of deep graph community detection models.
We propose an iterative learning framework that takes turns to update two modules: one working as the constrained graph generator and the other as the surrogate community detection model.
arXiv Detail & Related papers (2020-01-22T09:50:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.