CNS-Net: Conservative Novelty Synthesizing Network for Malware
Recognition in an Open-set Scenario
- URL: http://arxiv.org/abs/2305.01236v1
- Date: Tue, 2 May 2023 07:31:42 GMT
- Title: CNS-Net: Conservative Novelty Synthesizing Network for Malware
Recognition in an Open-set Scenario
- Authors: Jingcai Guo, Song Guo, Shiheng Ma, Yuxia Sun, Yuanyuan Xu
- Abstract summary: We study the challenging task of malware recognition on both known and novel unknown malware families, called malware open-set recognition (MOSR)
In this paper, we propose a novel model that can conservatively synthesize malware instances to mimic unknown malware families.
We also build a new large-scale malware dataset, named MAL-100, to fill the gap of lacking large open-set malware benchmark dataset.
- Score: 14.059646012441313
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the challenging task of malware recognition on both known and novel
unknown malware families, called malware open-set recognition (MOSR). Previous
works usually assume the malware families are known to the classifier in a
close-set scenario, i.e., testing families are the subset or at most identical
to training families. However, novel unknown malware families frequently emerge
in real-world applications, and as such, require to recognize malware instances
in an open-set scenario, i.e., some unknown families are also included in the
test-set, which has been rarely and non-thoroughly investigated in the
cyber-security domain. One practical solution for MOSR may consider jointly
classifying known and detecting unknown malware families by a single classifier
(e.g., neural network) from the variance of the predicted probability
distribution on known families. However, conventional well-trained classifiers
usually tend to obtain overly high recognition probabilities in the outputs,
especially when the instance feature distributions are similar to each other,
e.g., unknown v.s. known malware families, and thus dramatically degrades the
recognition on novel unknown malware families. In this paper, we propose a
novel model that can conservatively synthesize malware instances to mimic
unknown malware families and support a more robust training of the classifier.
Moreover, we also build a new large-scale malware dataset, named MAL-100, to
fill the gap of lacking large open-set malware benchmark dataset. Experimental
results on two widely used malware datasets and our MAL-100 demonstrate the
effectiveness of our model compared with other representative methods.
Related papers
- Catch'em all: Classification of Rare, Prominent, and Novel Malware Families [3.147175286021779]
Malware remains one of the most dangerous and costly cyber threats.
As of last year, researchers reported 1.3 billion known malware specimens.
These challenges include detection of novel malware and the ability to perform malware classification in the face of class imbalance.
arXiv Detail & Related papers (2024-03-04T23:46:19Z) - Small Effect Sizes in Malware Detection? Make Harder Train/Test Splits! [51.668411293817464]
Industry practitioners care about small improvements in malware detection accuracy because their models are deployed to hundreds of millions of machines.
Academic research is often restrained to public datasets on the order of ten thousand samples.
We devise an approach to generate a benchmark of difficulty from a pool of available samples.
arXiv Detail & Related papers (2023-12-25T21:25:55Z) - EMBERSim: A Large-Scale Databank for Boosting Similarity Search in
Malware Analysis [48.5877840394508]
In recent years there has been a shift from quantifications-based malware detection towards machine learning.
We propose to address the deficiencies in the space of similarity research on binary files, starting from EMBER.
We enhance EMBER with similarity information as well as malware class tags, to enable further research in the similarity space.
arXiv Detail & Related papers (2023-10-03T06:58:45Z) - Semi-supervised Classification of Malware Families Under Extreme Class Imbalance via Hierarchical Non-Negative Matrix Factorization with Automatic Model Selection [34.7994627734601]
We propose a novel hierarchical semi-supervised algorithm, which can be used in the early stages of the malware family labeling process.
With HNMFk, we exploit the hierarchical structure of the malware data together with a semi-supervised setup, which enables us to classify malware families under conditions of extreme class imbalance.
Our solution can perform abstaining predictions, or rejection option, which yields promising results in the identification of novel malware families.
arXiv Detail & Related papers (2023-09-12T23:45:59Z) - MDENet: Multi-modal Dual-embedding Networks for Malware Open-set
Recognition [17.027132477210092]
We propose the Multi-modal Dual-Embedding Networks, dubbed MDENet, to take advantage of comprehensive malware features.
We also enrich our previously proposed large-scaled malware dataset MAL-100 with multi-modal characteristics.
arXiv Detail & Related papers (2023-05-02T08:09:51Z) - DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified
Robustness [58.23214712926585]
We develop a certified defense, DRSM (De-Randomized Smoothed MalConv), by redesigning the de-randomized smoothing technique for the domain of malware detection.
Specifically, we propose a window ablation scheme to provably limit the impact of adversarial bytes while maximally preserving local structures of the executables.
We are the first to offer certified robustness in the realm of static detection of malware executables.
arXiv Detail & Related papers (2023-03-20T17:25:22Z) - Mate! Are You Really Aware? An Explainability-Guided Testing Framework
for Robustness of Malware Detectors [49.34155921877441]
We propose an explainability-guided and model-agnostic testing framework for robustness of malware detectors.
We then use this framework to test several state-of-the-art malware detectors' abilities to detect manipulated malware.
Our findings shed light on the limitations of current malware detectors, as well as how they can be improved.
arXiv Detail & Related papers (2021-11-19T08:02:38Z) - Task-Aware Meta Learning-based Siamese Neural Network for Classifying
Obfuscated Malware [5.293553970082943]
Existing malware detection methods fail to correctly classify different malware families when obfuscated malware samples are present in the training dataset.
We propose a novel task-aware few-shot-learning-based Siamese Neural Network that is resilient against such control flow obfuscation techniques.
Our proposed approach is highly effective in recognizing unique malware signatures, thus correctly classifying malware samples that belong to the same malware family.
arXiv Detail & Related papers (2021-10-26T04:44:13Z) - Being Single Has Benefits. Instance Poisoning to Deceive Malware
Classifiers [47.828297621738265]
We show how an attacker can launch a sophisticated and efficient poisoning attack targeting the dataset used to train a malware classifier.
As opposed to other poisoning attacks in the malware detection domain, our attack does not focus on malware families but rather on specific malware instances that contain an implanted trigger.
We propose a comprehensive detection approach that could serve as a future sophisticated defense against this newly discovered severe threat.
arXiv Detail & Related papers (2020-10-30T15:27:44Z) - Open Set Recognition with Conditional Probabilistic Generative Models [51.40872765917125]
We propose Conditional Probabilistic Generative Models (CPGM) for open set recognition.
CPGM can detect unknown samples but also classify known classes by forcing different latent features to approximate conditional Gaussian distributions.
Experiment results on multiple benchmark datasets reveal that the proposed method significantly outperforms the baselines.
arXiv Detail & Related papers (2020-08-12T06:23:49Z) - DAEMON: Dataset-Agnostic Explainable Malware Classification Using
Multi-Stage Feature Mining [3.04585143845864]
Malware classification is the task of determining to which family a new malicious variant belongs.
We present DAEMON, a novel dataset-agnostic malware classification tool.
arXiv Detail & Related papers (2020-08-04T21:57:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.