MDENet: Multi-modal Dual-embedding Networks for Malware Open-set
Recognition
- URL: http://arxiv.org/abs/2305.01245v1
- Date: Tue, 2 May 2023 08:09:51 GMT
- Title: MDENet: Multi-modal Dual-embedding Networks for Malware Open-set
Recognition
- Authors: Jingcai Guo, Yuanyuan Xu, Wenchao Xu, Yufeng Zhan, Yuxia Sun, Song Guo
- Abstract summary: We propose the Multi-modal Dual-Embedding Networks, dubbed MDENet, to take advantage of comprehensive malware features.
We also enrich our previously proposed large-scaled malware dataset MAL-100 with multi-modal characteristics.
- Score: 17.027132477210092
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Malware open-set recognition (MOSR) aims at jointly classifying malware
samples from known families and detect the ones from novel unknown families,
respectively. Existing works mostly rely on a well-trained classifier
considering the predicted probabilities of each known family with a
threshold-based detection to achieve the MOSR. However, our observation reveals
that the feature distributions of malware samples are extremely similar to each
other even between known and unknown families. Thus the obtained classifier may
produce overly high probabilities of testing unknown samples toward known
families and degrade the model performance. In this paper, we propose the
Multi-modal Dual-Embedding Networks, dubbed MDENet, to take advantage of
comprehensive malware features (i.e., malware images and malware sentences)
from different modalities to enhance the diversity of malware feature space,
which is more representative and discriminative for down-stream recognition.
Last, to further guarantee the open-set recognition, we dually embed the fused
multi-modal representation into one primary space and an associated sub-space,
i.e., discriminative and exclusive spaces, with contrastive sampling and
rho-bounded enclosing sphere regularizations, which resort to classification
and detection, respectively. Moreover, we also enrich our previously proposed
large-scaled malware dataset MAL-100 with multi-modal characteristics and
contribute an improved version dubbed MAL-100+. Experimental results on the
widely used malware dataset Mailing and the proposed MAL-100+ demonstrate the
effectiveness of our method.
Related papers
- Confidence-aware multi-modality learning for eye disease screening [58.861421804458395]
We propose a novel multi-modality evidential fusion pipeline for eye disease screening.
It provides a measure of confidence for each modality and elegantly integrates the multi-modality information.
Experimental results on both public and internal datasets demonstrate that our model excels in robustness.
arXiv Detail & Related papers (2024-05-28T13:27:30Z) - Deep Learning Fusion For Effective Malware Detection: Leveraging Visual Features [12.431734971186673]
We investigate the power of fusing Convolutional Neural Network models trained on different modalities of a malware executable.
We are proposing a novel multimodal fusion algorithm, leveraging three different visual malware features.
The proposed strategy has a detection rate of 1.00 (on a scale of 0-1) in identifying malware in the given dataset.
arXiv Detail & Related papers (2024-05-23T08:32:40Z) - Bayesian Learned Models Can Detect Adversarial Malware For Free [28.498994871579985]
Adversarial training is an effective method but is computationally expensive to scale up to large datasets.
In particular, a Bayesian formulation can capture the model parameters' distribution and quantify uncertainty without sacrificing model performance.
We found, quantifying uncertainty through Bayesian learning methods can defend against adversarial malware.
arXiv Detail & Related papers (2024-03-27T07:16:48Z) - Exploring Diverse Representations for Open Set Recognition [51.39557024591446]
Open set recognition (OSR) requires the model to classify samples that belong to closed sets while rejecting unknown samples during test.
Currently, generative models often perform better than discriminative models in OSR.
We propose a new model, namely Multi-Expert Diverse Attention Fusion (MEDAF), that learns diverse representations in a discriminative way.
arXiv Detail & Related papers (2024-01-12T11:40:22Z) - Unleashing Mask: Explore the Intrinsic Out-of-Distribution Detection
Capability [70.72426887518517]
Out-of-distribution (OOD) detection is an indispensable aspect of secure AI when deploying machine learning models in real-world applications.
We propose a novel method, Unleashing Mask, which aims to restore the OOD discriminative capabilities of the well-trained model with ID data.
Our method utilizes a mask to figure out the memorized atypical samples, and then finetune the model or prune it with the introduced mask to forget them.
arXiv Detail & Related papers (2023-06-06T14:23:34Z) - CNS-Net: Conservative Novelty Synthesizing Network for Malware
Recognition in an Open-set Scenario [14.059646012441313]
We study the challenging task of malware recognition on both known and novel unknown malware families, called malware open-set recognition (MOSR)
In this paper, we propose a novel model that can conservatively synthesize malware instances to mimic unknown malware families.
We also build a new large-scale malware dataset, named MAL-100, to fill the gap of lacking large open-set malware benchmark dataset.
arXiv Detail & Related papers (2023-05-02T07:31:42Z) - Reliable Multimodality Eye Disease Screening via Mixture of Student's t
Distributions [49.4545260500952]
We introduce a novel multimodality evidential fusion pipeline for eye disease screening, EyeMoSt.
Our model estimates both local uncertainty for unimodality and global uncertainty for the fusion modality to produce reliable classification results.
Our experimental findings on both public and in-house datasets show that our model is more reliable than current methods.
arXiv Detail & Related papers (2023-03-17T06:18:16Z) - Trusted Multi-View Classification with Dynamic Evidential Fusion [73.35990456162745]
We propose a novel multi-view classification algorithm, termed trusted multi-view classification (TMC)
TMC provides a new paradigm for multi-view learning by dynamically integrating different views at an evidence level.
Both theoretical and experimental results validate the effectiveness of the proposed model in accuracy, robustness and trustworthiness.
arXiv Detail & Related papers (2022-04-25T03:48:49Z) - Task-Aware Meta Learning-based Siamese Neural Network for Classifying
Obfuscated Malware [5.293553970082943]
Existing malware detection methods fail to correctly classify different malware families when obfuscated malware samples are present in the training dataset.
We propose a novel task-aware few-shot-learning-based Siamese Neural Network that is resilient against such control flow obfuscation techniques.
Our proposed approach is highly effective in recognizing unique malware signatures, thus correctly classifying malware samples that belong to the same malware family.
arXiv Detail & Related papers (2021-10-26T04:44:13Z) - Open Set Recognition with Conditional Probabilistic Generative Models [51.40872765917125]
We propose Conditional Probabilistic Generative Models (CPGM) for open set recognition.
CPGM can detect unknown samples but also classify known classes by forcing different latent features to approximate conditional Gaussian distributions.
Experiment results on multiple benchmark datasets reveal that the proposed method significantly outperforms the baselines.
arXiv Detail & Related papers (2020-08-12T06:23:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.