A Comprehensive Study on Learning-Based PE Malware Family Classification
Methods
- URL: http://arxiv.org/abs/2110.15552v1
- Date: Fri, 29 Oct 2021 05:32:28 GMT
- Title: A Comprehensive Study on Learning-Based PE Malware Family Classification
Methods
- Authors: Yixuan Ma, Shuang Liu, Jiajun Jiang, Guanhong Chen, Keqiu Li
- Abstract summary: Portable Executable (PE) malware has been consistently evolving in terms of both volume and sophistication.
Three mainstream approaches that use learning based algorithms, as categorized by the input format the methods take, are image-based, binary-based and disassembly-based approaches.
In this work, we conduct a thorough empirical study on learning-based PE malware classification approaches on 4 different datasets and consistent experiment settings.
- Score: 9.142578100395909
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Driven by the high profit, Portable Executable (PE) malware has been
consistently evolving in terms of both volume and sophistication. PE malware
family classification has gained great attention and a large number of
approaches have been proposed. With the rapid development of machine learning
techniques and the exciting results they achieved on various tasks, machine
learning algorithms have also gained popularity in the PE malware family
classification task. Three mainstream approaches that use learning based
algorithms, as categorized by the input format the methods take, are
image-based, binary-based and disassembly-based approaches. Although a large
number of approaches are published, there is no consistent comparisons on those
approaches, especially from the practical industry adoption perspective.
Moreover, there is no comparison in the scenario of concept drift, which is a
fact for the malware classification task due to the fast evolving nature of
malware. In this work, we conduct a thorough empirical study on learning-based
PE malware classification approaches on 4 different datasets and consistent
experiment settings. Based on the experiment results and an interview with our
industry partners, we find that (1) there is no individual class of methods
that significantly outperforms the others; (2) All classes of methods show
performance degradation on concept drift (by an average F1-score of 32.23%);
and (3) the prediction time and high memory consumption hinder existing
approaches from being adopted for industry usage.
Related papers
- Preview-based Category Contrastive Learning for Knowledge Distillation [53.551002781828146]
We propose a novel preview-based category contrastive learning method for knowledge distillation (PCKD)
It first distills the structural knowledge of both instance-level feature correspondence and the relation between instance features and category centers.
It can explicitly optimize the category representation and explore the distinct correlation between representations of instances and categories.
arXiv Detail & Related papers (2024-10-18T03:31:00Z) - A Survey of Malware Detection Using Deep Learning [6.349503549199403]
This paper investigates advances in malware detection on Windows, iOS, Android, and Linux using deep learning (DL)
We discuss the issues and the challenges in malware detection using DL classifiers.
We examine eight popular DL approaches on various datasets.
arXiv Detail & Related papers (2024-07-27T02:49:55Z) - POGEMA: A Benchmark Platform for Cooperative Multi-Agent Navigation [76.67608003501479]
We introduce and specify an evaluation protocol defining a range of domain-related metrics computed on the basics of the primary evaluation indicators.
The results of such a comparison, which involves a variety of state-of-the-art MARL, search-based, and hybrid methods, are presented.
arXiv Detail & Related papers (2024-07-20T16:37:21Z) - Continual Learning with Pre-Trained Models: A Survey [61.97613090666247]
Continual Learning aims to overcome the catastrophic forgetting of former knowledge when learning new ones.
This paper presents a comprehensive survey of the latest advancements in PTM-based CL.
arXiv Detail & Related papers (2024-01-29T18:27:52Z) - Malicious code detection in android: the role of sequence characteristics and disassembling methods [0.0]
We investigate and emphasize the factors that may affect the accuracy values of the models managed by researchers.
Our findings exhibit that the disassembly method and different input representations affect the model results.
arXiv Detail & Related papers (2023-12-02T11:55:05Z) - Predicted Embedding Power Regression for Large-Scale Out-of-Distribution
Detection [77.1596426383046]
We develop a novel approach that calculates the probability of the predicted class label based on label distributions learned during the training process.
Our method performs better than current state-of-the-art methods with only a negligible increase in compute cost.
arXiv Detail & Related papers (2023-03-07T18:28:39Z) - Evaluating Machine Unlearning via Epistemic Uncertainty [78.27542864367821]
This work presents an evaluation of Machine Unlearning algorithms based on uncertainty.
This is the first definition of a general evaluation of our best knowledge.
arXiv Detail & Related papers (2022-08-23T09:37:31Z) - On the Limitations of Continual Learning for Malware Classification [18.567946765007658]
We study 11 CL techniques applied to three malware tasks covering common incremental learning scenarios.
We evaluate the performance of the CL methods on both binary malware classification (Domain-IL) and multi-class malware family classification (Task-IL and Class-IL) tasks.
arXiv Detail & Related papers (2022-08-13T04:23:19Z) - Few-shot Action Recognition with Prototype-centered Attentive Learning [88.10852114988829]
Prototype-centered Attentive Learning (PAL) model composed of two novel components.
First, a prototype-centered contrastive learning loss is introduced to complement the conventional query-centered learning objective.
Second, PAL integrates a attentive hybrid learning mechanism that can minimize the negative impacts of outliers.
arXiv Detail & Related papers (2021-01-20T11:48:12Z) - Fair Meta-Learning For Few-Shot Classification [7.672769260569742]
A machine learning algorithm trained on biased data tends to make unfair predictions.
We propose a novel fair fast-adapted few-shot meta-learning approach that efficiently mitigates biases during meta-train.
We empirically demonstrate that our proposed approach efficiently mitigates biases on model output and generalizes both accuracy and fairness to unseen tasks.
arXiv Detail & Related papers (2020-09-23T22:33:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.