Mitigating the Impact of Malware Evolution on API Sequence-based Windows Malware Detector
- URL: http://arxiv.org/abs/2408.01661v1
- Date: Sat, 3 Aug 2024 04:21:24 GMT
- Title: Mitigating the Impact of Malware Evolution on API Sequence-based Windows Malware Detector
- Authors: Xingyuan Wei, Ce Li, Qiujian Lv, Ning Li, Degang Sun, Yan Wang,
- Abstract summary: Methods based on API sequences play a crucial role in malware prevention.
Evolved malware samples often use the API sequences of the pre-evolution samples to achieve similar malicious behaviors.
We propose a frame(MME) framework that can enhance existing API sequence-based malware detectors.
- Score: 5.953199557879621
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In dynamic Windows malware detection, deep learning models are extensively deployed to analyze API sequences. Methods based on API sequences play a crucial role in malware prevention. However, due to the continuous updates of APIs and the changes in API sequence calls leading to the constant evolution of malware variants, the detection capability of API sequence-based malware detection models significantly diminishes over time. We observe that the API sequences of malware samples before and after evolution usually have similar malicious semantics. Specifically, compared to the original samples, evolved malware samples often use the API sequences of the pre-evolution samples to achieve similar malicious behaviors. For instance, they access similar sensitive system resources and extend new malicious functions based on the original functionalities. In this paper, we propose a frame(MME), a framework that can enhance existing API sequence-based malware detectors and mitigate the adverse effects of malware evolution. To help detection models capture the similar semantics of these post-evolution API sequences, our framework represents API sequences using API knowledge graphs and system resource encodings and applies contrastive learning to enhance the model's encoder. Results indicate that, compared to Regular Text-CNN, our framework can significantly reduce the false positive rate by 13.10% and improve the F1-Score by 8.47% on five years of data, achieving the best experimental results. Additionally, evaluations show that our framework can save on the human costs required for model maintenance. We only need 1% of the budget per month to reduce the false positive rate by 11.16% and improve the F1-Score by 6.44%.
Related papers
- EarlyMalDetect: A Novel Approach for Early Windows Malware Detection Based on Sequences of API Calls [0.7373617024876725]
We propose EarlyMalDetect, a novel approach for early Windows malware detection based on sequences of API calls.
EarlyMalDetect can predict and reveal what a malware program is going to perform on the target system before it occurs.
Our extensive experimental evaluations show that the proposed approach is highly effective in predicting malware behaviors.
arXiv Detail & Related papers (2024-07-18T09:54:33Z) - Small Effect Sizes in Malware Detection? Make Harder Train/Test Splits! [51.668411293817464]
Industry practitioners care about small improvements in malware detection accuracy because their models are deployed to hundreds of millions of machines.
Academic research is often restrained to public datasets on the order of ten thousand samples.
We devise an approach to generate a benchmark of difficulty from a pool of available samples.
arXiv Detail & Related papers (2023-12-25T21:25:55Z) - Prompt Engineering-assisted Malware Dynamic Analysis Using GPT-4 [45.935748395725206]
We introduce a prompt engineering-assisted malware dynamic analysis using GPT-4.
In this method, GPT-4 is employed to create explanatory text for each API call within the API sequence.
BERT is used to obtain the representation of the text, from which we derive the representation of the API sequence.
arXiv Detail & Related papers (2023-12-13T17:39:44Z) - DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified
Robustness [58.23214712926585]
We develop a certified defense, DRSM (De-Randomized Smoothed MalConv), by redesigning the de-randomized smoothing technique for the domain of malware detection.
Specifically, we propose a window ablation scheme to provably limit the impact of adversarial bytes while maximally preserving local structures of the executables.
We are the first to offer certified robustness in the realm of static detection of malware executables.
arXiv Detail & Related papers (2023-03-20T17:25:22Z) - Behavioural Reports of Multi-Stage Malware [3.64414368529873]
This dataset provides API call sequences for thousands of malware samples executed in Windows 10 virtual machines.
A tutorial on how to create and expand this dataset is provided along with a benchmark demonstrating how to use this dataset to classify malware.
arXiv Detail & Related papers (2023-01-30T11:51:02Z) - Towards a Fair Comparison and Realistic Design and Evaluation Framework
of Android Malware Detectors [63.75363908696257]
We analyze 10 influential research works on Android malware detection using a common evaluation framework.
We identify five factors that, if not taken into account when creating datasets and designing detectors, significantly affect the trained ML models.
We conclude that the studied ML-based detectors have been evaluated optimistically, which justifies the good published results.
arXiv Detail & Related papers (2022-05-25T08:28:08Z) - Fast & Furious: Modelling Malware Detection as Evolving Data Streams [6.6892028759947175]
Malware is a major threat to computer systems and imposes many challenges to cyber security.
In this work, we evaluate the impact of concept drift on malware classifiers for two Android datasets.
arXiv Detail & Related papers (2022-05-24T18:43:40Z) - Mate! Are You Really Aware? An Explainability-Guided Testing Framework
for Robustness of Malware Detectors [49.34155921877441]
We propose an explainability-guided and model-agnostic testing framework for robustness of malware detectors.
We then use this framework to test several state-of-the-art malware detectors' abilities to detect manipulated malware.
Our findings shed light on the limitations of current malware detectors, as well as how they can be improved.
arXiv Detail & Related papers (2021-11-19T08:02:38Z) - Enhancing the Generalization for Intent Classification and Out-of-Domain
Detection in SLU [70.44344060176952]
Intent classification is a major task in spoken language understanding (SLU)
Recent works have shown that using extra data and labels can improve the OOD detection performance.
This paper proposes to train a model with only IND data while supporting both IND intent classification and OOD detection.
arXiv Detail & Related papers (2021-06-28T08:27:38Z) - MDEA: Malware Detection with Evolutionary Adversarial Learning [16.8615211682877]
MDEA, an Adversarial Malware Detection model uses evolutionary optimization to create attack samples to make the network robust against evasion attacks.
By retraining the model with the evolved malware samples, its performance improves a significant margin.
arXiv Detail & Related papers (2020-02-09T09:59:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.