Exploring Optimal Deep Learning Models for Image-based Malware Variant
Classification
- URL: http://arxiv.org/abs/2004.05258v2
- Date: Sun, 23 Oct 2022 16:08:12 GMT
- Title: Exploring Optimal Deep Learning Models for Image-based Malware Variant
Classification
- Authors: Rikima Mitsuhashi and Takahiro Shinagawa
- Abstract summary: We study the impact of differences in deep learning models and the degree of transfer learning on the classification accuracy of malware variants.
We found that the highest classification accuracy was obtained by fine-tuning one of the latest deep learning models with a relatively low degree of transfer learning.
- Score: 3.8073142980733
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Analyzing a huge amount of malware is a major burden for security analysts.
Since emerging malware is often a variant of existing malware, automatically
classifying malware into known families greatly reduces a part of their burden.
Image-based malware classification with deep learning is an attractive approach
for its simplicity, versatility, and affinity with the latest technologies.
However, the impact of differences in deep learning models and the degree of
transfer learning on the classification accuracy of malware variants has not
been fully studied. In this paper, we conducted an exhaustive survey of deep
learning models using 24 ImageNet pre-trained models and five fine-tuning
parameters, totaling 120 combinations, on two platforms. As a result, we found
that the highest classification accuracy was obtained by fine-tuning one of the
latest deep learning models with a relatively low degree of transfer learning,
and we achieved the highest classification accuracy ever in cross-validation on
the Malimg and Drebin datasets. We also confirmed that this trend holds true
for the recent malware variants using the VirusTotal 2020 Windows and Android
datasets. The experimental results suggest that it is effective to periodically
explore optimal deep learning models with the latest models and malware
datasets by gradually reducing the degree of transfer learning from half.
Related papers
- Boosting Alignment for Post-Unlearning Text-to-Image Generative Models [55.82190434534429]
Large-scale generative models have shown impressive image-generation capabilities, propelled by massive data.
This often inadvertently leads to the generation of harmful or inappropriate content and raises copyright concerns.
We propose a framework that seeks an optimal model update at each unlearning iteration, ensuring monotonic improvement on both objectives.
arXiv Detail & Related papers (2024-12-09T21:36:10Z) - PromptSAM+: Malware Detection based on Prompt Segment Anything Model [8.00932560688061]
We propose a visual malware general enhancement classification framework, PromptSAM+', based on a large visual network segmentation model.
Our experimental results indicate that 'PromptSAM+' is effective and efficient in malware detection and classification, achieving high accuracy and low rates of false positives and negatives.
arXiv Detail & Related papers (2024-08-04T15:42:34Z) - Revisiting Concept Drift in Windows Malware Detection: Adaptation to Real Drifted Malware with Minimal Samples [10.352741619176383]
We propose a new technique for detecting and classifying drifted malware.
It learns drift-invariant features in malware control flow graphs by leveraging graph neural networks with adversarial domain adaptation.
Our approach significantly improves drifted malware detection on publicly available benchmarks and real-world malware databases reported daily by security companies.
arXiv Detail & Related papers (2024-07-18T22:06:20Z) - Detecting new obfuscated malware variants: A lightweight and interpretable machine learning approach [0.0]
We present a machine learning-based system for detecting obfuscated malware that is highly accurate, lightweight and interpretable.
Our system is capable of detecting 15 malware subtypes despite being exclusively trained on one malware subtype, namely the Transponder from the Spyware family.
The Transponder-focused model exhibited high accuracy, exceeding 99.8%, with an average processing speed of 5.7 microseconds per file.
arXiv Detail & Related papers (2024-07-07T12:41:40Z) - Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - Small Effect Sizes in Malware Detection? Make Harder Train/Test Splits! [51.668411293817464]
Industry practitioners care about small improvements in malware detection accuracy because their models are deployed to hundreds of millions of machines.
Academic research is often restrained to public datasets on the order of ten thousand samples.
We devise an approach to generate a benchmark of difficulty from a pool of available samples.
arXiv Detail & Related papers (2023-12-25T21:25:55Z) - Continual Learning with Bayesian Model based on a Fixed Pre-trained
Feature Extractor [55.9023096444383]
Current deep learning models are characterised by catastrophic forgetting of old knowledge when learning new classes.
Inspired by the process of learning new knowledge in human brains, we propose a Bayesian generative model for continual learning.
arXiv Detail & Related papers (2022-04-28T08:41:51Z) - LifeLonger: A Benchmark for Continual Disease Classification [59.13735398630546]
We introduce LifeLonger, a benchmark for continual disease classification on the MedMNIST collection.
Task and class incremental learning of diseases address the issue of classifying new samples without re-training the models from scratch.
Cross-domain incremental learning addresses the issue of dealing with datasets originating from different institutions while retaining the previously obtained knowledge.
arXiv Detail & Related papers (2022-04-12T12:25:05Z) - Task-Aware Meta Learning-based Siamese Neural Network for Classifying
Obfuscated Malware [5.293553970082943]
Existing malware detection methods fail to correctly classify different malware families when obfuscated malware samples are present in the training dataset.
We propose a novel task-aware few-shot-learning-based Siamese Neural Network that is resilient against such control flow obfuscation techniques.
Our proposed approach is highly effective in recognizing unique malware signatures, thus correctly classifying malware samples that belong to the same malware family.
arXiv Detail & Related papers (2021-10-26T04:44:13Z) - Classifying Malware Images with Convolutional Neural Network Models [2.363388546004777]
In this paper, we use several convolutional neural network (CNN) models for static malware classification.
The Inception V3 model achieves a test accuracy of 99.24%, which is better than the accuracy of 98.52% achieved by the current state-of-the-art system.
arXiv Detail & Related papers (2020-10-30T07:39:30Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.