Related papers: Exploring Optimal Deep Learning Models for Image-based Malware Variant Classification

Exploring Optimal Deep Learning Models for Image-based Malware Variant Classification

URL: http://arxiv.org/abs/2004.05258v2
Date: Sun, 23 Oct 2022 16:08:12 GMT
Title: Exploring Optimal Deep Learning Models for Image-based Malware Variant Classification
Authors: Rikima Mitsuhashi and Takahiro Shinagawa
Abstract summary: We study the impact of differences in deep learning models and the degree of transfer learning on the classification accuracy of malware variants. We found that the highest classification accuracy was obtained by fine-tuning one of the latest deep learning models with a relatively low degree of transfer learning.
Score: 3.8073142980733
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Analyzing a huge amount of malware is a major burden for security analysts. Since emerging malware is often a variant of existing malware, automatically classifying malware into known families greatly reduces a part of their burden. Image-based malware classification with deep learning is an attractive approach for its simplicity, versatility, and affinity with the latest technologies. However, the impact of differences in deep learning models and the degree of transfer learning on the classification accuracy of malware variants has not been fully studied. In this paper, we conducted an exhaustive survey of deep learning models using 24 ImageNet pre-trained models and five fine-tuning parameters, totaling 120 combinations, on two platforms. As a result, we found that the highest classification accuracy was obtained by fine-tuning one of the latest deep learning models with a relatively low degree of transfer learning, and we achieved the highest classification accuracy ever in cross-validation on the Malimg and Drebin datasets. We also confirmed that this trend holds true for the recent malware variants using the VirusTotal 2020 Windows and Android datasets. The experimental results suggest that it is effective to periodically explore optimal deep learning models with the latest models and malware datasets by gradually reducing the degree of transfer learning from half.

Related papers

ADAPT: A Pseudo-labeling Approach to Combat Concept Drift in Malware Detection [0.8192907805418583]
Adapting machine learning models to changing data distributions requires frequent updates.<n>We introduce texttADAPT, a novel pseudo-labeling semi-supervised algorithm for addressing concept drift.
arXiv Detail & Related papers (2025-07-11T13:47:07Z)
Boosting Alignment for Post-Unlearning Text-to-Image Generative Models [55.82190434534429]
Large-scale generative models have shown impressive image-generation capabilities, propelled by massive data. This often inadvertently leads to the generation of harmful or inappropriate content and raises copyright concerns. We propose a framework that seeks an optimal model update at each unlearning iteration, ensuring monotonic improvement on both objectives.
arXiv Detail & Related papers (2024-12-09T21:36:10Z)
MalMixer: Few-Shot Malware Classification with Retrieval-Augmented Semi-Supervised Learning [8.724680868086626]
MalMixer is a semi-supervised malware family classifier that achieves high accuracy with sparse training data. We present a domain-knowledge-aware data augmentation technique for malware feature representations, enhancing few-shot performance of semi-supervised malware family classification.
arXiv Detail & Related papers (2024-09-20T04:50:49Z)
PromptSAM+: Malware Detection based on Prompt Segment Anything Model [8.00932560688061]
We propose a visual malware general enhancement classification framework, PromptSAM+', based on a large visual network segmentation model. Our experimental results indicate that 'PromptSAM+' is effective and efficient in malware detection and classification, achieving high accuracy and low rates of false positives and negatives.
arXiv Detail & Related papers (2024-08-04T15:42:34Z)
Revisiting Concept Drift in Windows Malware Detection: Adaptation to Real Drifted Malware with Minimal Samples [10.352741619176383]
We propose a new technique for detecting and classifying drifted malware. It learns drift-invariant features in malware control flow graphs by leveraging graph neural networks with adversarial domain adaptation. Our approach significantly improves drifted malware detection on publicly available benchmarks and real-world malware databases reported daily by security companies.
arXiv Detail & Related papers (2024-07-18T22:06:20Z)
Detecting new obfuscated malware variants: A lightweight and interpretable machine learning approach [0.0]
We present a machine learning-based system for detecting obfuscated malware that is highly accurate, lightweight and interpretable. Our system is capable of detecting 15 malware subtypes despite being exclusively trained on one malware subtype, namely the Transponder from the Spyware family. The Transponder-focused model exhibited high accuracy, exceeding 99.8%, with an average processing speed of 5.7 microseconds per file.
arXiv Detail & Related papers (2024-07-07T12:41:40Z)
Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images. We identify model weaknesses by testing the model using the counterfactual image dataset. We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z)
Small Effect Sizes in Malware Detection? Make Harder Train/Test Splits! [51.668411293817464]
Industry practitioners care about small improvements in malware detection accuracy because their models are deployed to hundreds of millions of machines. Academic research is often restrained to public datasets on the order of ten thousand samples. We devise an approach to generate a benchmark of difficulty from a pool of available samples.
arXiv Detail & Related papers (2023-12-25T21:25:55Z)
New Approach to Malware Detection Using Optimized Convolutional Neural Network [0.0]
This paper proposes a new convolutional deep learning neural network to accurately and effectively detect malware with high precision. The baseline model initially achieves 98% accurate rate but after increasing the depth of the CNN model, its accuracy reaches 99.183. To further solidify the effectiveness of this CNN model, we use the improved model to make predictions on new malware samples within our dataset.
arXiv Detail & Related papers (2023-01-26T15:06:47Z)
Continual Learning with Bayesian Model based on a Fixed Pre-trained Feature Extractor [55.9023096444383]
Current deep learning models are characterised by catastrophic forgetting of old knowledge when learning new classes. Inspired by the process of learning new knowledge in human brains, we propose a Bayesian generative model for continual learning.
arXiv Detail & Related papers (2022-04-28T08:41:51Z)
LifeLonger: A Benchmark for Continual Disease Classification [59.13735398630546]
We introduce LifeLonger, a benchmark for continual disease classification on the MedMNIST collection. Task and class incremental learning of diseases address the issue of classifying new samples without re-training the models from scratch. Cross-domain incremental learning addresses the issue of dealing with datasets originating from different institutions while retaining the previously obtained knowledge.
arXiv Detail & Related papers (2022-04-12T12:25:05Z)
Task-Aware Meta Learning-based Siamese Neural Network for Classifying Obfuscated Malware [5.293553970082943]
Existing malware detection methods fail to correctly classify different malware families when obfuscated malware samples are present in the training dataset. We propose a novel task-aware few-shot-learning-based Siamese Neural Network that is resilient against such control flow obfuscation techniques. Our proposed approach is highly effective in recognizing unique malware signatures, thus correctly classifying malware samples that belong to the same malware family.
arXiv Detail & Related papers (2021-10-26T04:44:13Z)
Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance. For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming. In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)
Classifying Malware Images with Convolutional Neural Network Models [2.363388546004777]
In this paper, we use several convolutional neural network (CNN) models for static malware classification. The Inception V3 model achieves a test accuracy of 99.24%, which is better than the accuracy of 98.52% achieved by the current state-of-the-art system.
arXiv Detail & Related papers (2020-10-30T07:39:30Z)
Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch. We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types. In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.