Detecting new obfuscated malware variants: A lightweight and interpretable machine learning approach
- URL: http://arxiv.org/abs/2407.07918v1
- Date: Sun, 7 Jul 2024 12:41:40 GMT
- Title: Detecting new obfuscated malware variants: A lightweight and interpretable machine learning approach
- Authors: Oladipo A. Madamidola, Felix Ngobigha, Adnane Ez-zizi,
- Abstract summary: We present a machine learning-based system for detecting obfuscated malware that is highly accurate, lightweight and interpretable.
Our system is capable of detecting 15 malware subtypes despite being exclusively trained on one malware subtype, namely the Transponder from the Spyware family.
The Transponder-focused model exhibited high accuracy, exceeding 99.8%, with an average processing speed of 5.7 microseconds per file.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning has been successfully applied in developing malware detection systems, with a primary focus on accuracy, and increasing attention to reducing computational overhead and improving model interpretability. However, an important question remains underexplored: How well can machine learning-based models detect entirely new forms of malware not present in the training data? In this study, we present a machine learning-based system for detecting obfuscated malware that is not only highly accurate, lightweight and interpretable, but also capable of successfully adapting to new types of malware attacks. Our system is capable of detecting 15 malware subtypes despite being exclusively trained on one malware subtype, namely the Transponder from the Spyware family. This system was built after training 15 distinct random forest-based models, each on a different malware subtype from the CIC-MalMem-2022 dataset. These models were evaluated against the entire range of malware subtypes, including all unseen malware subtypes. To maintain the system's streamlined nature, training was confined to the top five most important features, which also enhanced interpretability. The Transponder-focused model exhibited high accuracy, exceeding 99.8%, with an average processing speed of 5.7 microseconds per file. We also illustrate how the Shapley additive explanations technique can facilitate the interpretation of the model predictions. Our research contributes to advancing malware detection methodologies, pioneering the feasibility of detecting obfuscated malware by exclusively training a model on a single or a few carefully selected malware subtypes and applying it to detect unseen subtypes.
Related papers
- Unlearn and Burn: Adversarial Machine Unlearning Requests Destroy Model Accuracy [65.80757820884476]
We expose a critical yet underexplored vulnerability in the deployment of unlearning systems.
We present a threat model where an attacker can degrade model accuracy by submitting adversarial unlearning requests for data not present in the training set.
We evaluate various verification mechanisms to detect the legitimacy of unlearning requests and reveal the challenges in verification.
arXiv Detail & Related papers (2024-10-12T16:47:04Z) - MalMixer: Few-Shot Malware Classification with Retrieval-Augmented Semi-Supervised Learning [10.366927745010006]
MalMixer is a semi-supervised malware family classifier that achieves high accuracy with sparse training data.
We show that MalMixer achieves state-of-the-art performance in few-shot malware family classification settings.
arXiv Detail & Related papers (2024-09-20T04:50:49Z) - Small Effect Sizes in Malware Detection? Make Harder Train/Test Splits! [51.668411293817464]
Industry practitioners care about small improvements in malware detection accuracy because their models are deployed to hundreds of millions of machines.
Academic research is often restrained to public datasets on the order of ten thousand samples.
We devise an approach to generate a benchmark of difficulty from a pool of available samples.
arXiv Detail & Related papers (2023-12-25T21:25:55Z) - Unleashing Mask: Explore the Intrinsic Out-of-Distribution Detection
Capability [70.72426887518517]
Out-of-distribution (OOD) detection is an indispensable aspect of secure AI when deploying machine learning models in real-world applications.
We propose a novel method, Unleashing Mask, which aims to restore the OOD discriminative capabilities of the well-trained model with ID data.
Our method utilizes a mask to figure out the memorized atypical samples, and then finetune the model or prune it with the introduced mask to forget them.
arXiv Detail & Related papers (2023-06-06T14:23:34Z) - DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified
Robustness [58.23214712926585]
We develop a certified defense, DRSM (De-Randomized Smoothed MalConv), by redesigning the de-randomized smoothing technique for the domain of malware detection.
Specifically, we propose a window ablation scheme to provably limit the impact of adversarial bytes while maximally preserving local structures of the executables.
We are the first to offer certified robustness in the realm of static detection of malware executables.
arXiv Detail & Related papers (2023-03-20T17:25:22Z) - Self-Supervised Vision Transformers for Malware Detection [0.0]
This paper presents SHERLOCK, a self-supervision based deep learning model to detect malware based on the Vision Transformer (ViT) architecture.
Our proposed model is also able to outperform state-of-the-art techniques for multi-class malware classification of types and family with macro-F1 score of.497 and.491 respectively.
arXiv Detail & Related papers (2022-08-15T07:49:58Z) - Task-Aware Meta Learning-based Siamese Neural Network for Classifying
Obfuscated Malware [5.293553970082943]
Existing malware detection methods fail to correctly classify different malware families when obfuscated malware samples are present in the training dataset.
We propose a novel task-aware few-shot-learning-based Siamese Neural Network that is resilient against such control flow obfuscation techniques.
Our proposed approach is highly effective in recognizing unique malware signatures, thus correctly classifying malware samples that belong to the same malware family.
arXiv Detail & Related papers (2021-10-26T04:44:13Z) - Evading Malware Classifiers via Monte Carlo Mutant Feature Discovery [23.294653273180472]
We show how a malicious actor trains a surrogate model to discover binary mutations that cause an instance to be misclassified.
Then, mutated malware is sent to the victim model that takes the place of an antivirus API to test whether it can evade detection.
arXiv Detail & Related papers (2021-06-15T03:31:02Z) - Being Single Has Benefits. Instance Poisoning to Deceive Malware
Classifiers [47.828297621738265]
We show how an attacker can launch a sophisticated and efficient poisoning attack targeting the dataset used to train a malware classifier.
As opposed to other poisoning attacks in the malware detection domain, our attack does not focus on malware families but rather on specific malware instances that contain an implanted trigger.
We propose a comprehensive detection approach that could serve as a future sophisticated defense against this newly discovered severe threat.
arXiv Detail & Related papers (2020-10-30T15:27:44Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z) - Exploring Optimal Deep Learning Models for Image-based Malware Variant
Classification [3.8073142980733]
We study the impact of differences in deep learning models and the degree of transfer learning on the classification accuracy of malware variants.
We found that the highest classification accuracy was obtained by fine-tuning one of the latest deep learning models with a relatively low degree of transfer learning.
arXiv Detail & Related papers (2020-04-10T23:45:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.