Exploring Optimal Deep Learning Models for Image-based Malware Variant
Classification
- URL: http://arxiv.org/abs/2004.05258v2
- Date: Sun, 23 Oct 2022 16:08:12 GMT
- Title: Exploring Optimal Deep Learning Models for Image-based Malware Variant
Classification
- Authors: Rikima Mitsuhashi and Takahiro Shinagawa
- Abstract summary: We study the impact of differences in deep learning models and the degree of transfer learning on the classification accuracy of malware variants.
We found that the highest classification accuracy was obtained by fine-tuning one of the latest deep learning models with a relatively low degree of transfer learning.
- Score: 3.8073142980733
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Analyzing a huge amount of malware is a major burden for security analysts.
Since emerging malware is often a variant of existing malware, automatically
classifying malware into known families greatly reduces a part of their burden.
Image-based malware classification with deep learning is an attractive approach
for its simplicity, versatility, and affinity with the latest technologies.
However, the impact of differences in deep learning models and the degree of
transfer learning on the classification accuracy of malware variants has not
been fully studied. In this paper, we conducted an exhaustive survey of deep
learning models using 24 ImageNet pre-trained models and five fine-tuning
parameters, totaling 120 combinations, on two platforms. As a result, we found
that the highest classification accuracy was obtained by fine-tuning one of the
latest deep learning models with a relatively low degree of transfer learning,
and we achieved the highest classification accuracy ever in cross-validation on
the Malimg and Drebin datasets. We also confirmed that this trend holds true
for the recent malware variants using the VirusTotal 2020 Windows and Android
datasets. The experimental results suggest that it is effective to periodically
explore optimal deep learning models with the latest models and malware
datasets by gradually reducing the degree of transfer learning from half.
Related papers
- PromptSAM+: Malware Detection based on Prompt Segment Anything Model [8.00932560688061]
We propose a visual malware general enhancement classification framework, PromptSAM+', based on a large visual network segmentation model.
Our experimental results indicate that 'PromptSAM+' is effective and efficient in malware detection and classification, achieving high accuracy and low rates of false positives and negatives.
arXiv Detail & Related papers (2024-08-04T15:42:34Z) - Detecting new obfuscated malware variants: A lightweight and interpretable machine learning approach [0.0]
We present a machine learning-based system for detecting obfuscated malware that is highly accurate, lightweight and interpretable.
Our system is capable of detecting 15 malware subtypes despite being exclusively trained on one malware subtype, namely the Transponder from the Spyware family.
The Transponder-focused model exhibited high accuracy, exceeding 99.8%, with an average processing speed of 5.7 microseconds per file.
arXiv Detail & Related papers (2024-07-07T12:41:40Z) - Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - Small Effect Sizes in Malware Detection? Make Harder Train/Test Splits! [51.668411293817464]
Industry practitioners care about small improvements in malware detection accuracy because their models are deployed to hundreds of millions of machines.
Academic research is often restrained to public datasets on the order of ten thousand samples.
We devise an approach to generate a benchmark of difficulty from a pool of available samples.
arXiv Detail & Related papers (2023-12-25T21:25:55Z) - New Approach to Malware Detection Using Optimized Convolutional Neural
Network [0.0]
This paper proposes a new convolutional deep learning neural network to accurately and effectively detect malware with high precision.
The baseline model initially achieves 98% accurate rate but after increasing the depth of the CNN model, its accuracy reaches 99.183.
To further solidify the effectiveness of this CNN model, we use the improved model to make predictions on new malware samples within our dataset.
arXiv Detail & Related papers (2023-01-26T15:06:47Z) - Continual Learning with Bayesian Model based on a Fixed Pre-trained
Feature Extractor [55.9023096444383]
Current deep learning models are characterised by catastrophic forgetting of old knowledge when learning new classes.
Inspired by the process of learning new knowledge in human brains, we propose a Bayesian generative model for continual learning.
arXiv Detail & Related papers (2022-04-28T08:41:51Z) - LifeLonger: A Benchmark for Continual Disease Classification [59.13735398630546]
We introduce LifeLonger, a benchmark for continual disease classification on the MedMNIST collection.
Task and class incremental learning of diseases address the issue of classifying new samples without re-training the models from scratch.
Cross-domain incremental learning addresses the issue of dealing with datasets originating from different institutions while retaining the previously obtained knowledge.
arXiv Detail & Related papers (2022-04-12T12:25:05Z) - Task-Aware Meta Learning-based Siamese Neural Network for Classifying
Obfuscated Malware [5.293553970082943]
Existing malware detection methods fail to correctly classify different malware families when obfuscated malware samples are present in the training dataset.
We propose a novel task-aware few-shot-learning-based Siamese Neural Network that is resilient against such control flow obfuscation techniques.
Our proposed approach is highly effective in recognizing unique malware signatures, thus correctly classifying malware samples that belong to the same malware family.
arXiv Detail & Related papers (2021-10-26T04:44:13Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z) - Classifying Malware Images with Convolutional Neural Network Models [2.363388546004777]
In this paper, we use several convolutional neural network (CNN) models for static malware classification.
The Inception V3 model achieves a test accuracy of 99.24%, which is better than the accuracy of 98.52% achieved by the current state-of-the-art system.
arXiv Detail & Related papers (2020-10-30T07:39:30Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.