Prompt Engineering-assisted Malware Dynamic Analysis Using GPT-4
- URL: http://arxiv.org/abs/2312.08317v1
- Date: Wed, 13 Dec 2023 17:39:44 GMT
- Title: Prompt Engineering-assisted Malware Dynamic Analysis Using GPT-4
- Authors: Pei Yan, Shunquan Tan, Miaohui Wang and Jiwu Huang
- Abstract summary: We introduce a prompt engineering-assisted malware dynamic analysis using GPT-4.
In this method, GPT-4 is employed to create explanatory text for each API call within the API sequence.
BERT is used to obtain the representation of the text, from which we derive the representation of the API sequence.
- Score: 45.935748395725206
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dynamic analysis methods effectively identify shelled, wrapped, or obfuscated
malware, thereby preventing them from invading computers. As a significant
representation of dynamic malware behavior, the API (Application Programming
Interface) sequence, comprised of consecutive API calls, has progressively
become the dominant feature of dynamic analysis methods. Though there have been
numerous deep learning models for malware detection based on API sequences, the
quality of API call representations produced by those models is limited. These
models cannot generate representations for unknown API calls, which weakens
both the detection performance and the generalization. Further, the concept
drift phenomenon of API calls is prominent. To tackle these issues, we
introduce a prompt engineering-assisted malware dynamic analysis using GPT-4.
In this method, GPT-4 is employed to create explanatory text for each API call
within the API sequence. Afterward, the pre-trained language model BERT is used
to obtain the representation of the text, from which we derive the
representation of the API sequence. Theoretically, this proposed method is
capable of generating representations for all API calls, excluding the
necessity for dataset training during the generation process. Utilizing the
representation, a CNN-based detection model is designed to extract the feature.
We adopt five benchmark datasets to validate the performance of the proposed
model. The experimental results reveal that the proposed detection algorithm
performs better than the state-of-the-art method (TextCNN). Specifically, in
cross-database experiments and few-shot learning experiments, the proposed
model achieves excellent detection performance and almost a 100% recall rate
for malware, verifying its superior generalization performance. The code is
available at: github.com/yan-scnu/Prompted_Dynamic_Detection.
Related papers
- A Lean Transformer Model for Dynamic Malware Analysis and Detection [0.0]
Malware is a fast-growing threat to the modern computing world and existing lines of defense are not efficient enough to address this issue.
Previous works have shown some success leveraging Neural Networks and API calls sequences extracted from execution reports.
In this paper, we design an emulation-Only model, based on the Transformers architecture, to detect malicious files.
arXiv Detail & Related papers (2024-08-05T08:46:46Z) - Mitigating the Impact of Malware Evolution on API Sequence-based Windows Malware Detector [5.953199557879621]
Methods based on API sequences play a crucial role in malware prevention.
Evolved malware samples often use the API sequences of the pre-evolution samples to achieve similar malicious behaviors.
We propose a frame(MME) framework that can enhance existing API sequence-based malware detectors.
arXiv Detail & Related papers (2024-08-03T04:21:24Z) - FANTAstic SEquences and Where to Find Them: Faithful and Efficient API Call Generation through State-tracked Constrained Decoding and Reranking [57.53742155914176]
API call generation is the cornerstone of large language models' tool-using ability.
Existing supervised and in-context learning approaches suffer from high training costs, poor data efficiency, and generated API calls that can be unfaithful to the API documentation and the user's request.
We propose an output-side optimization approach called FANTASE to address these limitations.
arXiv Detail & Related papers (2024-07-18T23:44:02Z) - EarlyMalDetect: A Novel Approach for Early Windows Malware Detection Based on Sequences of API Calls [0.7373617024876725]
We propose EarlyMalDetect, a novel approach for early Windows malware detection based on sequences of API calls.
EarlyMalDetect can predict and reveal what a malware program is going to perform on the target system before it occurs.
Our extensive experimental evaluations show that the proposed approach is highly effective in predicting malware behaviors.
arXiv Detail & Related papers (2024-07-18T09:54:33Z) - A Classification-by-Retrieval Framework for Few-Shot Anomaly Detection to Detect API Injection Attacks [9.693391036125908]
We propose a novel unsupervised few-shot anomaly detection framework composed of two main parts.
First, we train a dedicated generic language model for API based on FastText embedding.
Next, we use Approximate Nearest Neighbor search in a classification-by-retrieval approach.
arXiv Detail & Related papers (2024-05-18T10:15:31Z) - Adaptive REST API Testing with Reinforcement Learning [54.68542517176757]
Current testing tools lack efficient exploration mechanisms, treating all operations and parameters equally.
Current tools struggle when response schemas are absent in the specification or exhibit variants.
We present an adaptive REST API testing technique incorporates reinforcement learning to prioritize operations during exploration.
arXiv Detail & Related papers (2023-09-08T20:27:05Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Decoder Tuning: Efficient Language Understanding as Decoding [84.68266271483022]
We present Decoder Tuning (DecT), which in contrast optimize task-specific decoder networks on the output side.
By gradient-based optimization, DecT can be trained within several seconds and requires only one P query per sample.
We conduct extensive natural language understanding experiments and show that DecT significantly outperforms state-of-the-art algorithms with a $200times$ speed-up.
arXiv Detail & Related papers (2022-12-16T11:15:39Z) - Using sequential drift detection to test the API economy [4.056434158960926]
API economy refers to the widespread integration of API (advanced programming interface)
It is desirable to monitor the usage patterns and identify when the system is used in a way that was never used before.
In this work we analyze both histograms and call graph of API usage to determine if the usage patterns of the system has shifted.
arXiv Detail & Related papers (2021-11-09T13:24:19Z) - Enhancing the Generalization for Intent Classification and Out-of-Domain
Detection in SLU [70.44344060176952]
Intent classification is a major task in spoken language understanding (SLU)
Recent works have shown that using extra data and labels can improve the OOD detection performance.
This paper proposes to train a model with only IND data while supporting both IND intent classification and OOD detection.
arXiv Detail & Related papers (2021-06-28T08:27:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.