Related papers: MAD-OOD: A Deep Learning Cluster-Driven Framework for an Out-of-Distribution Malware Detection and Classification

MAD-OOD: A Deep Learning Cluster-Driven Framework for an Out-of-Distribution Malware Detection and Classification

URL: http://arxiv.org/abs/2512.17594v1
Date: Fri, 19 Dec 2025 14:02:37 GMT
Title: MAD-OOD: A Deep Learning Cluster-Driven Framework for an Out-of-Distribution Malware Detection and Classification
Authors: Tosin Ige, Christopher Kiekintveld, Aritran Piplai, Asif Rahman, Olukunle Kolade, Sasidhar Kunapuli,
Abstract summary: This paper presents MADOOD, a novel two stage, cluster driven deep learning framework for robust OOD malware detection and classification.<n>In the first stage, malware family embeddings are modeled using class conditional spherical decision boundaries.<n>In the second stage, a deep neural network integrates cluster based predictions, refined embeddings, and supervised outputs to enhance final classification accuracy.
Score: 0.43071347808687493
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Out of distribution (OOD) detection remains a critical challenge in malware classification due to the substantial intra family variability introduced by polymorphic and metamorphic malware variants. Most existing deep learning based malware detectors rely on closed world assumptions and fail to adequately model this intra class variation, resulting in degraded performance when confronted with previously unseen malware families. This paper presents MADOOD, a novel two stage, cluster driven deep learning framework for robust OOD malware detection and classification. In the first stage, malware family embeddings are modeled using class conditional spherical decision boundaries derived from Gaussian Discriminant Analysis (GDA), enabling statistically grounded separation of indistribution and OOD samples without requiring OOD data during training. Z score based distance analysis across multiple class centroids is employed to reliably identify anomalous samples in the latent space. In the second stage, a deep neural network integrates cluster based predictions, refined embeddings, and supervised classifier outputs to enhance final classification accuracy. Extensive evaluations on benchmark malware datasets comprising 25 known families and multiple novel OOD variants demonstrate that MADOOD significantly outperforms state of the art OOD detection methods, achieving an AUC of up to 0.911 on unseen malware families. The proposed framework provides a scalable, interpretable, and statistically principled solution for real world malware detection and anomaly identification in evolving cybersecurity environments.

Related papers

Evaluating deep learning models for fault diagnosis of a rotating machinery with epistemic and aleatoric uncertainty [1.2582887633807602]
This paper presents the first comparative study of uncertainty-aware deep learning architectures for fault diagnosis in rotating machinery.<n>We show that deep ensemble models show superior performance, independently of the uncertainty threshold used for discrimination.<n>Our findings offer guidance to practitioners and researchers who have to deploy real-world uncertainty-aware fault diagnosis systems.
arXiv Detail & Related papers (2024-12-25T20:22:59Z)
Comprehensive evaluation of Mal-API-2019 dataset by machine learning in malware detection [0.5475886285082937]
This study conducts a thorough examination of malware detection using machine learning techniques. The aim is to advance cybersecurity capabilities by identifying and mitigating threats more effectively.
arXiv Detail & Related papers (2024-03-04T17:22:43Z)
Semi-supervised Classification of Malware Families Under Extreme Class Imbalance via Hierarchical Non-Negative Matrix Factorization with Automatic Model Selection [34.7994627734601]
We propose a novel hierarchical semi-supervised algorithm, which can be used in the early stages of the malware family labeling process. With HNMFk, we exploit the hierarchical structure of the malware data together with a semi-supervised setup, which enables us to classify malware families under conditions of extreme class imbalance. Our solution can perform abstaining predictions, or rejection option, which yields promising results in the identification of novel malware families.
arXiv Detail & Related papers (2023-09-12T23:45:59Z)
From Global to Local: Multi-scale Out-of-distribution Detection [129.37607313927458]
Out-of-distribution (OOD) detection aims to detect "unknown" data whose labels have not been seen during the in-distribution (ID) training process. Recent progress in representation learning gives rise to distance-based OOD detection. We propose Multi-scale OOD DEtection (MODE), a first framework leveraging both global visual information and local region details.
arXiv Detail & Related papers (2023-08-20T11:56:25Z)
Unleashing Mask: Explore the Intrinsic Out-of-Distribution Detection Capability [70.72426887518517]
Out-of-distribution (OOD) detection is an indispensable aspect of secure AI when deploying machine learning models in real-world applications. We propose a novel method, Unleashing Mask, which aims to restore the OOD discriminative capabilities of the well-trained model with ID data. Our method utilizes a mask to figure out the memorized atypical samples, and then finetune the model or prune it with the introduced mask to forget them.
arXiv Detail & Related papers (2023-06-06T14:23:34Z)
Energy-based Out-of-Distribution Detection for Graph Neural Networks [76.0242218180483]
We propose a simple, powerful and efficient OOD detection model for GNN-based learning on graphs, which we call GNNSafe. GNNSafe achieves up to $17.0%$ AUROC improvement over state-of-the-arts and it could serve as simple yet strong baselines in such an under-developed area.
arXiv Detail & Related papers (2023-02-06T16:38:43Z)
Provably Robust Detection of Out-of-distribution Data (almost) for free [124.14121487542613]
Deep neural networks are known to produce highly overconfident predictions on out-of-distribution (OOD) data. In this paper we propose a novel method where from first principles we combine a certifiable OOD detector with a standard classifier into an OOD aware classifier. In this way we achieve the best of two worlds: certifiably adversarially robust OOD detection, even for OOD samples close to the in-distribution, without loss in prediction accuracy and close to state-of-the-art OOD detection performance for non-manipulated OOD data.
arXiv Detail & Related papers (2021-06-08T11:40:49Z)
Anomaly Detection in Cybersecurity: Unsupervised, Graph-Based and Supervised Learning Methods in Adversarial Environments [63.942632088208505]
Inherent to today's operating environment is the practice of adversarial machine learning. In this work, we examine the feasibility of unsupervised learning and graph-based methods for anomaly detection. We incorporate a realistic adversarial training mechanism when training our supervised models to enable strong classification performance in adversarial environments.
arXiv Detail & Related papers (2021-05-14T10:05:10Z)
Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance. For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming. In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)
Uncertainty-Based Out-of-Distribution Classification in Deep Reinforcement Learning [17.10036674236381]
Wrong predictions for out-of-distribution data can cause safety critical situations in machine learning systems. We propose a framework for uncertainty-based OOD classification: UBOOD. We show that UBOOD produces reliable classification results when combined with ensemble-based estimators.
arXiv Detail & Related papers (2019-12-31T09:52:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.