MalCL: Leveraging GAN-Based Generative Replay to Combat Catastrophic Forgetting in Malware Classification
- URL: http://arxiv.org/abs/2501.01110v1
- Date: Thu, 02 Jan 2025 07:15:31 GMT
- Title: MalCL: Leveraging GAN-Based Generative Replay to Combat Catastrophic Forgetting in Malware Classification
- Authors: Jimin Park, AHyun Ji, Minji Park, Mohammad Saidur Rahman, Se Eun Oh,
- Abstract summary: Continual Learning (CL) for malware classification tackles the rapidly evolving nature of malware threats.
We introduce a GR-based CL system that employs Generative Adversarial Networks (GANs) with feature matching loss to generate high-quality malware samples.
Our system achieves an average accuracy of 55% on Windows malware samples, significantly outperforming other GR-based models by 28%.
- Score: 1.9961741493139218
- License:
- Abstract: Continual Learning (CL) for malware classification tackles the rapidly evolving nature of malware threats and the frequent emergence of new types. Generative Replay (GR)-based CL systems utilize a generative model to produce synthetic versions of past data, which are then combined with new data to retrain the primary model. Traditional machine learning techniques in this domain often struggle with catastrophic forgetting, where a model's performance on old data degrades over time. In this paper, we introduce a GR-based CL system that employs Generative Adversarial Networks (GANs) with feature matching loss to generate high-quality malware samples. Additionally, we implement innovative selection schemes for replay samples based on the model's hidden representations. Our comprehensive evaluation across Windows and Android malware datasets in a class-incremental learning scenario -- where new classes are introduced continuously over multiple tasks -- demonstrates substantial performance improvements over previous methods. For example, our system achieves an average accuracy of 55% on Windows malware samples, significantly outperforming other GR-based models by 28%. This study provides practical insights for advancing GR-based malware classification systems. The implementation is available at \url {https://github.com/MalwareReplayGAN/MalCL}\footnote{The code will be made public upon the presentation of the paper}.
Related papers
- Happy: A Debiased Learning Framework for Continual Generalized Category Discovery [54.54153155039062]
This paper explores the underexplored task of Continual Generalized Category Discovery (C-GCD)
C-GCD aims to incrementally discover new classes from unlabeled data while maintaining the ability to recognize previously learned classes.
We introduce a debiased learning framework, namely Happy, characterized by Hardness-aware prototype sampling and soft entropy regularization.
arXiv Detail & Related papers (2024-10-09T04:18:51Z) - Continual Domain Incremental Learning for Privacy-aware Digital Pathology [3.6630930118966814]
Continual learning (CL) techniques aim to reduce the forgetting of past data when learning new data with distributional shift conditions.
We develop a Generative Latent Replay-based CL (GLRCL) approach to store past data and perform latent replay with new data.
arXiv Detail & Related papers (2024-09-10T12:21:54Z) - Revisiting Concept Drift in Windows Malware Detection: Adaptation to Real Drifted Malware with Minimal Samples [10.352741619176383]
We propose a new technique for detecting and classifying drifted malware.
It learns drift-invariant features in malware control flow graphs by leveraging graph neural networks with adversarial domain adaptation.
Our approach significantly improves drifted malware detection on publicly available benchmarks and real-world malware databases reported daily by security companies.
arXiv Detail & Related papers (2024-07-18T22:06:20Z) - Activate and Reject: Towards Safe Domain Generalization under Category
Shift [71.95548187205736]
We study a practical problem of Domain Generalization under Category Shift (DGCS)
It aims to simultaneously detect unknown-class samples and classify known-class samples in the target domains.
Compared to prior DG works, we face two new challenges: 1) how to learn the concept of unknown'' during training with only source known-class samples, and 2) how to adapt the source-trained model to unseen environments.
arXiv Detail & Related papers (2023-10-07T07:53:12Z) - Class-Incremental Learning: A Survey [84.30083092434938]
Class-Incremental Learning (CIL) enables the learner to incorporate the knowledge of new classes incrementally.
CIL tends to catastrophically forget the characteristics of former ones, and its performance drastically degrades.
We provide a rigorous and unified evaluation of 17 methods in benchmark image classification tasks to find out the characteristics of different algorithms.
arXiv Detail & Related papers (2023-02-07T17:59:05Z) - When a RF Beats a CNN and GRU, Together -- A Comparison of Deep Learning
and Classical Machine Learning Approaches for Encrypted Malware Traffic
Classification [4.495583520377878]
We show that in the case of malicious traffic classification, state-of-the-art DL-based solutions do not necessarily outperform the classical ML-based ones.
We exemplify this finding using two well-known datasets for a varied set of tasks, such as: malware detection, malware family classification, detection of zero-day attacks, and classification of an iteratively growing dataset.
arXiv Detail & Related papers (2022-06-16T08:59:53Z) - Self-Supervised Class Incremental Learning [51.62542103481908]
Existing Class Incremental Learning (CIL) methods are based on a supervised classification framework sensitive to data labels.
When updating them based on the new class data, they suffer from catastrophic forgetting: the model cannot discern old class data clearly from the new.
In this paper, we explore the performance of Self-Supervised representation learning in Class Incremental Learning (SSCIL) for the first time.
arXiv Detail & Related papers (2021-11-18T06:58:19Z) - GANG-MAM: GAN based enGine for Modifying Android Malware [1.6799377888527687]
Malware detectors based on machine learning are vulnerable to adversarial attacks.
We propose a system that produces a feature vector for making an Android malware strongly evasive and then modify the malicious program accordingly.
arXiv Detail & Related papers (2021-09-27T18:36:20Z) - Dual-Teacher Class-Incremental Learning With Data-Free Generative Replay [49.691610143011566]
We propose two novel knowledge transfer techniques for class-incremental learning (CIL)
First, we propose data-free generative replay (DF-GR) to mitigate catastrophic forgetting in CIL by using synthetic samples from a generative model.
Second, we introduce dual-teacher information distillation (DT-ID) for knowledge distillation from two teachers to one student.
arXiv Detail & Related papers (2021-06-17T22:13:15Z) - Always Be Dreaming: A New Approach for Data-Free Class-Incremental
Learning [73.24988226158497]
We consider the high-impact problem of Data-Free Class-Incremental Learning (DFCIL)
We propose a novel incremental distillation strategy for DFCIL, contributing a modified cross-entropy training and importance-weighted feature distillation.
Our method results in up to a 25.1% increase in final task accuracy (absolute difference) compared to SOTA DFCIL methods for common class-incremental benchmarks.
arXiv Detail & Related papers (2021-06-17T17:56:08Z) - MDEA: Malware Detection with Evolutionary Adversarial Learning [16.8615211682877]
MDEA, an Adversarial Malware Detection model uses evolutionary optimization to create attack samples to make the network robust against evasion attacks.
By retraining the model with the evolved malware samples, its performance improves a significant margin.
arXiv Detail & Related papers (2020-02-09T09:59:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.