Related papers: Mitigating the Impact of Malware Evolution on API Sequence-based Windows Malware Detector

Mitigating the Impact of Malware Evolution on API Sequence-based Windows Malware Detector

URL: http://arxiv.org/abs/2408.01661v1
Date: Sat, 3 Aug 2024 04:21:24 GMT
Title: Mitigating the Impact of Malware Evolution on API Sequence-based Windows Malware Detector
Authors: Xingyuan Wei, Ce Li, Qiujian Lv, Ning Li, Degang Sun, Yan Wang,
Abstract summary: Methods based on API sequences play a crucial role in malware prevention. Evolved malware samples often use the API sequences of the pre-evolution samples to achieve similar malicious behaviors. We propose a frame(MME) framework that can enhance existing API sequence-based malware detectors.
Score: 5.953199557879621
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In dynamic Windows malware detection, deep learning models are extensively deployed to analyze API sequences. Methods based on API sequences play a crucial role in malware prevention. However, due to the continuous updates of APIs and the changes in API sequence calls leading to the constant evolution of malware variants, the detection capability of API sequence-based malware detection models significantly diminishes over time. We observe that the API sequences of malware samples before and after evolution usually have similar malicious semantics. Specifically, compared to the original samples, evolved malware samples often use the API sequences of the pre-evolution samples to achieve similar malicious behaviors. For instance, they access similar sensitive system resources and extend new malicious functions based on the original functionalities. In this paper, we propose a frame(MME), a framework that can enhance existing API sequence-based malware detectors and mitigate the adverse effects of malware evolution. To help detection models capture the similar semantics of these post-evolution API sequences, our framework represents API sequences using API knowledge graphs and system resource encodings and applies contrastive learning to enhance the model's encoder. Results indicate that, compared to Regular Text-CNN, our framework can significantly reduce the false positive rate by 13.10% and improve the F1-Score by 8.47% on five years of data, achieving the best experimental results. Additionally, evaluations show that our framework can save on the human costs required for model maintenance. We only need 1% of the budget per month to reduce the false positive rate by 11.16% and improve the F1-Score by 6.44%.

Related papers

Fundamental Limitations in Defending LLM Finetuning APIs [61.29028411001255]
We show that defences of fine-tuning APIs are fundamentally limited in their ability to prevent fine-tuning attacks. We construct 'pointwise-undetectable' attacks that repurpose entropy in benign model outputs to covertly transmit dangerous knowledge. We test our attacks against the OpenAI fine-tuning API, finding they succeed in eliciting answers to harmful multiple-choice questions.
arXiv Detail & Related papers (2025-02-20T18:45:01Z)
Malware Detection based on API calls [0.48866322421122627]
We explore a lightweight, order-invariant approach to detecting and mitigating malware threats. We publish a public dataset of over three hundred thousand samples, annotated with labels indicating benign or malicious activity. We leverage machine learning algorithms, such as random forests, and conduct behavioral analysis by examining patterns and anomalies in API call sequences.
arXiv Detail & Related papers (2025-02-18T13:51:56Z)
EarlyMalDetect: A Novel Approach for Early Windows Malware Detection Based on Sequences of API Calls [0.7373617024876725]
We propose EarlyMalDetect, a novel approach for early Windows malware detection based on sequences of API calls. EarlyMalDetect can predict and reveal what a malware program is going to perform on the target system before it occurs. Our extensive experimental evaluations show that the proposed approach is highly effective in predicting malware behaviors.
arXiv Detail & Related papers (2024-07-18T09:54:33Z)
Small Effect Sizes in Malware Detection? Make Harder Train/Test Splits! [51.668411293817464]
Industry practitioners care about small improvements in malware detection accuracy because their models are deployed to hundreds of millions of machines. Academic research is often restrained to public datasets on the order of ten thousand samples. We devise an approach to generate a benchmark of difficulty from a pool of available samples.
arXiv Detail & Related papers (2023-12-25T21:25:55Z)
Prompt Engineering-assisted Malware Dynamic Analysis Using GPT-4 [45.935748395725206]
We introduce a prompt engineering-assisted malware dynamic analysis using GPT-4. In this method, GPT-4 is employed to create explanatory text for each API call within the API sequence. BERT is used to obtain the representation of the text, from which we derive the representation of the API sequence.
arXiv Detail & Related papers (2023-12-13T17:39:44Z)
DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified Robustness [58.23214712926585]
We develop a certified defense, DRSM (De-Randomized Smoothed MalConv), by redesigning the de-randomized smoothing technique for the domain of malware detection. Specifically, we propose a window ablation scheme to provably limit the impact of adversarial bytes while maximally preserving local structures of the executables. We are the first to offer certified robustness in the realm of static detection of malware executables.
arXiv Detail & Related papers (2023-03-20T17:25:22Z)
Behavioural Reports of Multi-Stage Malware [3.64414368529873]
This dataset provides API call sequences for thousands of malware samples executed in Windows 10 virtual machines. A tutorial on how to create and expand this dataset is provided along with a benchmark demonstrating how to use this dataset to classify malware.
arXiv Detail & Related papers (2023-01-30T11:51:02Z)
Towards a Fair Comparison and Realistic Design and Evaluation Framework of Android Malware Detectors [63.75363908696257]
We analyze 10 influential research works on Android malware detection using a common evaluation framework. We identify five factors that, if not taken into account when creating datasets and designing detectors, significantly affect the trained ML models. We conclude that the studied ML-based detectors have been evaluated optimistically, which justifies the good published results.
arXiv Detail & Related papers (2022-05-25T08:28:08Z)
Fast & Furious: Modelling Malware Detection as Evolving Data Streams [6.6892028759947175]
Malware is a major threat to computer systems and imposes many challenges to cyber security. In this work, we evaluate the impact of concept drift on malware classifiers for two Android datasets.
arXiv Detail & Related papers (2022-05-24T18:43:40Z)
Mate! Are You Really Aware? An Explainability-Guided Testing Framework for Robustness of Malware Detectors [49.34155921877441]
We propose an explainability-guided and model-agnostic testing framework for robustness of malware detectors. We then use this framework to test several state-of-the-art malware detectors' abilities to detect manipulated malware. Our findings shed light on the limitations of current malware detectors, as well as how they can be improved.
arXiv Detail & Related papers (2021-11-19T08:02:38Z)
Enhancing the Generalization for Intent Classification and Out-of-Domain Detection in SLU [70.44344060176952]
Intent classification is a major task in spoken language understanding (SLU) Recent works have shown that using extra data and labels can improve the OOD detection performance. This paper proposes to train a model with only IND data while supporting both IND intent classification and OOD detection.
arXiv Detail & Related papers (2021-06-28T08:27:38Z)
MDEA: Malware Detection with Evolutionary Adversarial Learning [16.8615211682877]
MDEA, an Adversarial Malware Detection model uses evolutionary optimization to create attack samples to make the network robust against evasion attacks. By retraining the model with the evolved malware samples, its performance improves a significant margin.
arXiv Detail & Related papers (2020-02-09T09:59:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.