Related papers: Clustering based opcode graph generation for malware variant detection

Clustering based opcode graph generation for malware variant detection

URL: http://arxiv.org/abs/2211.10048v1
Date: Fri, 18 Nov 2022 06:12:33 GMT
Title: Clustering based opcode graph generation for malware variant detection
Authors: Kar Wai Fok, Vrizlynn L. L. Thing
Abstract summary: We propose a methodology to perform malware detection and family attribution. The proposed methodology first performs the extraction of opcodes from malwares in each family and constructs their respective opcode graphs. We explore the use of clustering algorithms on the opcode graphs to detect clusters of malwares within the same malware family.
Score: 1.179179628317559
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Malwares are the key means leveraged by threat actors in the cyber space for their attacks. There is a large array of commercial solutions in the market and significant scientific research to tackle the challenge of the detection and defense against malwares. At the same time, attackers also advance their capabilities in creating polymorphic and metamorphic malwares to make it increasingly challenging for existing solutions. To tackle this issue, we propose a methodology to perform malware detection and family attribution. The proposed methodology first performs the extraction of opcodes from malwares in each family and constructs their respective opcode graphs. We explore the use of clustering algorithms on the opcode graphs to detect clusters of malwares within the same malware family. Such clusters can be seen as belonging to different sub-family groups. Opcode graph signatures are built from each detected cluster. Hence, for each malware family, a group of signatures is generated to represent the family. These signatures are used to classify an unknown sample as benign or belonging to one the malware families. We evaluate our methodology by performing experiments on a dataset consisting of both benign files and malware samples belonging to a number of different malware families and comparing the results to existing approach.

Related papers

MASKDROID: Robust Android Malware Detection with Masked Graph Representations [56.09270390096083]
We propose MASKDROID, a powerful detector with a strong discriminative ability to identify malware. We introduce a masking mechanism into the Graph Neural Network based framework, forcing MASKDROID to recover the whole input graph. This strategy enables the model to understand the malicious semantics and learn more stable representations, enhancing its robustness against adversarial attacks.
arXiv Detail & Related papers (2024-09-29T07:22:47Z)
Understanding crypter-as-a-service in a popular underground marketplace [51.328567400947435]
Crypters are pieces of software whose main goal is to transform a target binary so it can avoid detection from Anti Viruses (AVs) applications. The crypter-as-a-service model has gained popularity, in response to the increased sophistication of detection mechanisms. This paper provides the first study on an online underground market dedicated to crypter-as-a-service.
arXiv Detail & Related papers (2024-05-20T08:35:39Z)
Online Clustering of Known and Emerging Malware Families [1.2289361708127875]
It is essential to categorize malware samples according to their malicious characteristics. Online clustering algorithms help us to understand malware behavior and produce a quicker response to new threats. This paper introduces a novel machine learning-based model for the online clustering of malicious samples into malware families.
arXiv Detail & Related papers (2024-05-06T09:20:17Z)
EMBERSim: A Large-Scale Databank for Boosting Similarity Search in Malware Analysis [48.5877840394508]
In recent years there has been a shift from quantifications-based malware detection towards machine learning. We propose to address the deficiencies in the space of similarity research on binary files, starting from EMBER. We enhance EMBER with similarity information as well as malware class tags, to enable further research in the similarity space.
arXiv Detail & Related papers (2023-10-03T06:58:45Z)
DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified Robustness [58.23214712926585]
We develop a certified defense, DRSM (De-Randomized Smoothed MalConv), by redesigning the de-randomized smoothing technique for the domain of malware detection. Specifically, we propose a window ablation scheme to provably limit the impact of adversarial bytes while maximally preserving local structures of the executables. We are the first to offer certified robustness in the realm of static detection of malware executables.
arXiv Detail & Related papers (2023-03-20T17:25:22Z)
From Malware Samples to Fractal Images: A New Paradigm for Classification. (Version 2.0, Previous version paper name: Have you ever seen malware?) [0.3670422696827526]
We propose a very unconventional and novel approach to malware visualisation based on dynamic behaviour analysis. The idea is that the images, which are visually very interesting, are then used to classify malware concerning goodware. The results of the presented experiments are based on a database of 6 589 997 goodware, 827 853 potentially unwanted applications and 4 174 203 malware samples.
arXiv Detail & Related papers (2022-12-05T15:15:54Z)
Design of secure and robust cognitive system for malware detection [0.571097144710995]
Adversarial samples are generated by intelligently crafting and adding perturbations to the input samples. The aim of this thesis is to address the critical system security issues. A novel technique to detect stealthy malware is proposed.
arXiv Detail & Related papers (2022-08-03T18:52:38Z)
New Datasets for Dynamic Malware Classification [0.0]
We introduce two new, updated datasets of malicious software, VirusSamples and VirusShare. This paper analyzes multi-class malware classification performance of the balanced and imbalanced version of these two datasets. Results show that Support Vector Machine, achieves the highest score of 94% in the imbalanced VirusSample dataset. XGBoost, one of the most common gradient boosting-based models, achieves the highest score of 90% and 80%.in both versions of the VirusShare dataset.
arXiv Detail & Related papers (2021-11-30T08:31:16Z)
Being Single Has Benefits. Instance Poisoning to Deceive Malware Classifiers [47.828297621738265]
We show how an attacker can launch a sophisticated and efficient poisoning attack targeting the dataset used to train a malware classifier. As opposed to other poisoning attacks in the malware detection domain, our attack does not focus on malware families but rather on specific malware instances that contain an implanted trigger. We propose a comprehensive detection approach that could serve as a future sophisticated defense against this newly discovered severe threat.
arXiv Detail & Related papers (2020-10-30T15:27:44Z)
DAEMON: Dataset-Agnostic Explainable Malware Classification Using Multi-Stage Feature Mining [3.04585143845864]
Malware classification is the task of determining to which family a new malicious variant belongs. We present DAEMON, a novel dataset-agnostic malware classification tool.
arXiv Detail & Related papers (2020-08-04T21:57:30Z)
Adversarial Attack on Community Detection by Hiding Individuals [68.76889102470203]
We focus on black-box attack and aim to hide targeted individuals from the detection of deep graph community detection models. We propose an iterative learning framework that takes turns to update two modules: one working as the constrained graph generator and the other as the surrogate community detection model.
arXiv Detail & Related papers (2020-01-22T09:50:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.