Hidden Markov Models with Random Restarts vs Boosting for Malware
Detection
- URL: http://arxiv.org/abs/2307.10256v1
- Date: Mon, 17 Jul 2023 13:21:58 GMT
- Title: Hidden Markov Models with Random Restarts vs Boosting for Malware
Detection
- Authors: Aditya Raghavan and Fabio Di Troia and Mark Stamp
- Abstract summary: We compare boosted HMMs (using AdaBoost) to HMMs trained with multiple random restarts, in the context of malware detection.
We find that random restarts perform surprisingly well in comparison to boosting.
- Score: 5.414308305392762
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Effective and efficient malware detection is at the forefront of research
into building secure digital systems. As with many other fields, malware
detection research has seen a dramatic increase in the application of machine
learning algorithms. One machine learning technique that has been used widely
in the field of pattern matching in general-and malware detection in
particular-is hidden Markov models (HMMs). HMM training is based on a hill
climb, and hence we can often improve a model by training multiple times with
different initial values. In this research, we compare boosted HMMs (using
AdaBoost) to HMMs trained with multiple random restarts, in the context of
malware detection. These techniques are applied to a variety of challenging
malware datasets. We find that random restarts perform surprisingly well in
comparison to boosting. Only in the most difficult "cold start" cases (where
training data is severely limited) does boosting appear to offer sufficient
improvement to justify its higher computational cost in the scoring phase.
Related papers
- Small Effect Sizes in Malware Detection? Make Harder Train/Test Splits! [51.668411293817464]
Industry practitioners care about small improvements in malware detection accuracy because their models are deployed to hundreds of millions of machines.
Academic research is often restrained to public datasets on the order of ten thousand samples.
We devise an approach to generate a benchmark of difficulty from a pool of available samples.
arXiv Detail & Related papers (2023-12-25T21:25:55Z) - Enhancing Malware Detection by Integrating Machine Learning with Cuckoo
Sandbox [0.0]
This study aims to classify and identify malware extracted from a dataset containing API call sequences.
Both deep learning and machine learning algorithms achieve remarkably high levels of accuracy, reaching up to 99% in certain cases.
arXiv Detail & Related papers (2023-11-07T22:33:17Z) - EMBERSim: A Large-Scale Databank for Boosting Similarity Search in
Malware Analysis [48.5877840394508]
In recent years there has been a shift from quantifications-based malware detection towards machine learning.
We propose to address the deficiencies in the space of similarity research on binary files, starting from EMBER.
We enhance EMBER with similarity information as well as malware class tags, to enable further research in the similarity space.
arXiv Detail & Related papers (2023-10-03T06:58:45Z) - A Comparison of Adversarial Learning Techniques for Malware Detection [1.2289361708127875]
We use gradient-based, evolutionary algorithm-based, and reinforcement-based methods to generate adversarial samples.
Experiments show that the Gym-malware generator, which uses a reinforcement learning approach, has the greatest practical potential.
arXiv Detail & Related papers (2023-08-19T09:22:32Z) - Unleashing Mask: Explore the Intrinsic Out-of-Distribution Detection
Capability [70.72426887518517]
Out-of-distribution (OOD) detection is an indispensable aspect of secure AI when deploying machine learning models in real-world applications.
We propose a novel method, Unleashing Mask, which aims to restore the OOD discriminative capabilities of the well-trained model with ID data.
Our method utilizes a mask to figure out the memorized atypical samples, and then finetune the model or prune it with the introduced mask to forget them.
arXiv Detail & Related papers (2023-06-06T14:23:34Z) - FGAM:Fast Adversarial Malware Generation Method Based on Gradient Sign [16.16005518623829]
Adversarial attacks are to deceive the deep learning model by generating adversarial samples.
This paper proposes FGAM (Fast Generate Adversarial Malware), a method for fast generating adversarial malware.
It is experimentally verified that the success rate of the adversarial malware deception model generated by FGAM is increased by about 84% compared with existing methods.
arXiv Detail & Related papers (2023-05-22T06:58:34Z) - Can Feature Engineering Help Quantum Machine Learning for Malware
Detection? [7.010669841466896]
We propose a hybrid framework of theoretical Quantum ML to address this problem.
VQC with XGBoost selected features can get a 78.91% test accuracy on the simulator.
The average accuracy for the model trained using the features selected with XGBoost was 74%.
arXiv Detail & Related papers (2023-05-03T19:33:49Z) - DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified
Robustness [58.23214712926585]
We develop a certified defense, DRSM (De-Randomized Smoothed MalConv), by redesigning the de-randomized smoothing technique for the domain of malware detection.
Specifically, we propose a window ablation scheme to provably limit the impact of adversarial bytes while maximally preserving local structures of the executables.
We are the first to offer certified robustness in the realm of static detection of malware executables.
arXiv Detail & Related papers (2023-03-20T17:25:22Z) - Mate! Are You Really Aware? An Explainability-Guided Testing Framework
for Robustness of Malware Detectors [49.34155921877441]
We propose an explainability-guided and model-agnostic testing framework for robustness of malware detectors.
We then use this framework to test several state-of-the-art malware detectors' abilities to detect manipulated malware.
Our findings shed light on the limitations of current malware detectors, as well as how they can be improved.
arXiv Detail & Related papers (2021-11-19T08:02:38Z) - Detection of Malicious Android Applications: Classical Machine Learning
vs. Deep Neural Network Integrated with Clustering [2.179313476241343]
Traditional malware detection mechanisms are not able to cope-up with next-generation malware attacks.
We propose effective and efficient Android malware detection models based on machine learning and deep learning integrated with clustering.
arXiv Detail & Related papers (2021-02-28T21:50:57Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.