Malware Detection based on API calls
- URL: http://arxiv.org/abs/2502.12863v1
- Date: Tue, 18 Feb 2025 13:51:56 GMT
- Title: Malware Detection based on API calls
- Authors: Christofer Fellicious, Manuel Bischof, Kevin Mayer, Dorian Eikenberg, Stefan Hausotte, Hans P. Reiser, Michael Granitzer,
- Abstract summary: We explore a lightweight, order-invariant approach to detecting and mitigating malware threats.
We publish a public dataset of over three hundred thousand samples, annotated with labels indicating benign or malicious activity.
We leverage machine learning algorithms, such as random forests, and conduct behavioral analysis by examining patterns and anomalies in API call sequences.
- Score: 0.48866322421122627
- License:
- Abstract: Malware attacks pose a significant threat in today's interconnected digital landscape, causing billions of dollars in damages. Detecting and identifying families as early as possible provides an edge in protecting against such malware. We explore a lightweight, order-invariant approach to detecting and mitigating malware threats: analyzing API calls without regard to their sequence. We publish a public dataset of over three hundred thousand samples and their function call parameters for this task, annotated with labels indicating benign or malicious activity. The complete dataset is above 550GB uncompressed in size. We leverage machine learning algorithms, such as random forests, and conduct behavioral analysis by examining patterns and anomalies in API call sequences. By investigating how the function calls occur regardless of their order, we can identify discriminating features that can help us identify malware early on. The models we've developed are not only effective but also efficient. They are lightweight and can run on any machine with minimal performance overhead, while still achieving an impressive F1-Score of over 85\%. We also empirically show that we only need a subset of the function call sequence, specifically calls to the ntdll.dll library, to identify malware. Our research demonstrates the efficacy of this approach through empirical evaluations, underscoring its accuracy and scalability. The code is open source and available at Github along with the dataset on Zenodo.
Related papers
- Unveiling Malware Patterns: A Self-analysis Perspective [15.517313565392852]
VisUnpack is a static analysis-based data visualization framework for bolstering attack prevention and aiding recovery post-attack.
Our method includes unpacking packed malware programs, calculating local similarity descriptors based on basic blocks, enhancing correlations between descriptors, and refining them by minimizing noises.
Our comprehensive evaluation of VisUnpack based on a freshly gathered dataset with over 27,106 samples confirms its capability in accurately classifying malware programs with a precision of 99.7%.
arXiv Detail & Related papers (2025-01-10T16:04:13Z) - Living off the Analyst: Harvesting Features from Yara Rules for Malware Detection [50.55317257140427]
A strategy used by malicious actors is to "live off the land," where benign systems are used and repurposed for the malicious actor's intent.
We show that this is plausible via YARA rules, which use human-written signatures to detect specific malware families.
By extracting sub-signatures from publicly available YARA rules, we assembled a set of features that can more effectively discriminate malicious samples.
arXiv Detail & Related papers (2024-11-27T17:03:00Z) - Mitigating the Impact of Malware Evolution on API Sequence-based Windows Malware Detector [5.953199557879621]
Methods based on API sequences play a crucial role in malware prevention.
Evolved malware samples often use the API sequences of the pre-evolution samples to achieve similar malicious behaviors.
We propose a frame(MME) framework that can enhance existing API sequence-based malware detectors.
arXiv Detail & Related papers (2024-08-03T04:21:24Z) - EarlyMalDetect: A Novel Approach for Early Windows Malware Detection Based on Sequences of API Calls [0.7373617024876725]
We propose EarlyMalDetect, a novel approach for early Windows malware detection based on sequences of API calls.
EarlyMalDetect can predict and reveal what a malware program is going to perform on the target system before it occurs.
Our extensive experimental evaluations show that the proposed approach is highly effective in predicting malware behaviors.
arXiv Detail & Related papers (2024-07-18T09:54:33Z) - Prompt Engineering-assisted Malware Dynamic Analysis Using GPT-4 [45.935748395725206]
We introduce a prompt engineering-assisted malware dynamic analysis using GPT-4.
In this method, GPT-4 is employed to create explanatory text for each API call within the API sequence.
BERT is used to obtain the representation of the text, from which we derive the representation of the API sequence.
arXiv Detail & Related papers (2023-12-13T17:39:44Z) - DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified
Robustness [58.23214712926585]
We develop a certified defense, DRSM (De-Randomized Smoothed MalConv), by redesigning the de-randomized smoothing technique for the domain of malware detection.
Specifically, we propose a window ablation scheme to provably limit the impact of adversarial bytes while maximally preserving local structures of the executables.
We are the first to offer certified robustness in the realm of static detection of malware executables.
arXiv Detail & Related papers (2023-03-20T17:25:22Z) - Behavioural Reports of Multi-Stage Malware [3.64414368529873]
This dataset provides API call sequences for thousands of malware samples executed in Windows 10 virtual machines.
A tutorial on how to create and expand this dataset is provided along with a benchmark demonstrating how to use this dataset to classify malware.
arXiv Detail & Related papers (2023-01-30T11:51:02Z) - Mate! Are You Really Aware? An Explainability-Guided Testing Framework
for Robustness of Malware Detectors [49.34155921877441]
We propose an explainability-guided and model-agnostic testing framework for robustness of malware detectors.
We then use this framework to test several state-of-the-art malware detectors' abilities to detect manipulated malware.
Our findings shed light on the limitations of current malware detectors, as well as how they can be improved.
arXiv Detail & Related papers (2021-11-19T08:02:38Z) - Simple Transparent Adversarial Examples [65.65977217108659]
We introduce secret embedding and transparent adversarial examples as a simpler way to evaluate robustness.
As a result, they pose a serious threat where APIs are used for high-stakes applications.
arXiv Detail & Related papers (2021-05-20T11:54:26Z) - Being Single Has Benefits. Instance Poisoning to Deceive Malware
Classifiers [47.828297621738265]
We show how an attacker can launch a sophisticated and efficient poisoning attack targeting the dataset used to train a malware classifier.
As opposed to other poisoning attacks in the malware detection domain, our attack does not focus on malware families but rather on specific malware instances that contain an implanted trigger.
We propose a comprehensive detection approach that could serve as a future sophisticated defense against this newly discovered severe threat.
arXiv Detail & Related papers (2020-10-30T15:27:44Z) - Feature-level Malware Obfuscation in Deep Learning [0.0]
We train a deep neural network classifier for malware classification using features of benign and malware samples.
We demonstrate a steep increase in false negative rate (i.e., attacks succeed) by randomly adding features of a benign app to malware.
We find that for API calls, it is possible to reject the vast majority of attacks, where using Intents or Permissions is less successful.
arXiv Detail & Related papers (2020-02-10T00:47:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.