Related papers: MADAR: Efficient Continual Learning for Malware Analysis with Diversity-Aware Replay

MADAR: Efficient Continual Learning for Malware Analysis with Diversity-Aware Replay

URL: http://arxiv.org/abs/2502.05760v1
Date: Sun, 09 Feb 2025 03:37:48 GMT
Title: MADAR: Efficient Continual Learning for Malware Analysis with Diversity-Aware Replay
Authors: Mohammad Saidur Rahman, Scott Coull, Qi Yu, Matthew Wright,
Abstract summary: Continual learning holds the potential to reduce the storage and computational costs of regularly retraining over all the collected data.<n>We propose MADAR, a CL framework that accounts for the unique properties and challenges of the malware data distribution.
Score: 21.54671696689243
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Millions of new pieces of malicious software (i.e., malware) are introduced each year. This poses significant challenges for antivirus vendors, who use machine learning to detect and analyze malware, and must keep up with changes in the distribution while retaining knowledge of older variants. Continual learning (CL) holds the potential to address this challenge by reducing the storage and computational costs of regularly retraining over all the collected data. Prior work, however, shows that CL techniques, which are designed primarily for computer vision tasks, fare poorly when applied to malware classification. To address these issues, we begin with an exploratory analysis of a typical malware dataset, which reveals that malware families are diverse and difficult to characterize, requiring a wide variety of samples to learn a robust representation. Based on these findings, we propose $\underline{M}$alware $\underline{A}$nalysis with $\underline{D}$iversity-$\underline{A}$ware $\underline{R}$eplay (MADAR), a CL framework that accounts for the unique properties and challenges of the malware data distribution. Through extensive evaluation on large-scale Windows and Android malware datasets, we show that MADAR significantly outperforms prior work. This highlights the importance of understanding domain characteristics when designing CL techniques and demonstrates a path forward for the malware classification domain.

Related papers

Unveiling Malware Patterns: A Self-analysis Perspective [15.517313565392852]
VisUnpack is a static analysis-based data visualization framework for bolstering attack prevention and aiding recovery post-attack.<n>Our method includes unpacking packed malware programs, calculating local similarity descriptors based on basic blocks, enhancing correlations between descriptors, and refining them by minimizing noises.<n>Our comprehensive evaluation of VisUnpack based on a freshly gathered dataset with over 27,106 samples confirms its capability in accurately classifying malware programs with a precision of 99.7%.
arXiv Detail & Related papers (2025-01-10T16:04:13Z)
MASKDROID: Robust Android Malware Detection with Masked Graph Representations [56.09270390096083]
We propose MASKDROID, a powerful detector with a strong discriminative ability to identify malware. We introduce a masking mechanism into the Graph Neural Network based framework, forcing MASKDROID to recover the whole input graph. This strategy enables the model to understand the malicious semantics and learn more stable representations, enhancing its robustness against adversarial attacks.
arXiv Detail & Related papers (2024-09-29T07:22:47Z)
A Survey of Malware Detection Using Deep Learning [6.349503549199403]
This paper investigates advances in malware detection on Windows, iOS, Android, and Linux using deep learning (DL) We discuss the issues and the challenges in malware detection using DL classifiers. We examine eight popular DL approaches on various datasets.
arXiv Detail & Related papers (2024-07-27T02:49:55Z)
Ransomware Detection Using Federated Learning with Imbalanced Datasets [0.0]
This paper presents a weighted cross-entropy loss function approach to mitigate dataset imbalance. A detailed performance evaluation study is then presented for the case of static analysis using the latest Windows-based ransomware families.
arXiv Detail & Related papers (2023-11-13T21:21:39Z)
MalDICT: Benchmark Datasets on Malware Behaviors, Platforms, Exploitation, and Packers [44.700094741798445]
Existing research on malware classification focuses almost exclusively on two tasks: distinguishing between malicious and benign files and classifying malware by family. We have identified four tasks which are under-represented in prior work: classification by behaviors that malware exhibit, platforms that malware run on, vulnerabilities that malware exploit, and packers that malware are packed with. We are releasing benchmark datasets for each of these four classification tasks, tagged using ClarAVy and comprising nearly 5.5 million malicious files in total.
arXiv Detail & Related papers (2023-10-18T04:36:26Z)
EMBERSim: A Large-Scale Databank for Boosting Similarity Search in Malware Analysis [48.5877840394508]
In recent years there has been a shift from quantifications-based malware detection towards machine learning. We propose to address the deficiencies in the space of similarity research on binary files, starting from EMBER. We enhance EMBER with similarity information as well as malware class tags, to enable further research in the similarity space.
arXiv Detail & Related papers (2023-10-03T06:58:45Z)
DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified Robustness [58.23214712926585]
We develop a certified defense, DRSM (De-Randomized Smoothed MalConv), by redesigning the de-randomized smoothing technique for the domain of malware detection. Specifically, we propose a window ablation scheme to provably limit the impact of adversarial bytes while maximally preserving local structures of the executables. We are the first to offer certified robustness in the realm of static detection of malware executables.
arXiv Detail & Related papers (2023-03-20T17:25:22Z)
On the Limitations of Continual Learning for Malware Classification [18.567946765007658]
We study 11 CL techniques applied to three malware tasks covering common incremental learning scenarios. We evaluate the performance of the CL methods on both binary malware classification (Domain-IL) and multi-class malware family classification (Task-IL and Class-IL) tasks.
arXiv Detail & Related papers (2022-08-13T04:23:19Z)
Using Static and Dynamic Malware features to perform Malware Ascription [0.0]
We employ various Static and Dynamic features of malicious executables to classify malware based on their family. We leverage Cuckoo Sandbox and machine learning to make progress in this research.
arXiv Detail & Related papers (2021-12-05T18:01:09Z)
Being Single Has Benefits. Instance Poisoning to Deceive Malware Classifiers [47.828297621738265]
We show how an attacker can launch a sophisticated and efficient poisoning attack targeting the dataset used to train a malware classifier. As opposed to other poisoning attacks in the malware detection domain, our attack does not focus on malware families but rather on specific malware instances that contain an implanted trigger. We propose a comprehensive detection approach that could serve as a future sophisticated defense against this newly discovered severe threat.
arXiv Detail & Related papers (2020-10-30T15:27:44Z)
Adversarial EXEmples: A Survey and Experimental Evaluation of Practical Attacks on Machine Learning for Windows Malware Detection [67.53296659361598]
adversarial EXEmples can bypass machine learning-based detection by perturbing relatively few input bytes. We develop a unifying framework that does not only encompass and generalize previous attacks against machine-learning models, but also includes three novel attacks. These attacks, named Full DOS, Extend and Shift, inject the adversarial payload by respectively manipulating the DOS header, extending it, and shifting the content of the first section.
arXiv Detail & Related papers (2020-08-17T07:16:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.