Related papers: Prompt Engineering-assisted Malware Dynamic Analysis Using GPT-4

Prompt Engineering-assisted Malware Dynamic Analysis Using GPT-4

URL: http://arxiv.org/abs/2312.08317v1
Date: Wed, 13 Dec 2023 17:39:44 GMT
Title: Prompt Engineering-assisted Malware Dynamic Analysis Using GPT-4
Authors: Pei Yan, Shunquan Tan, Miaohui Wang and Jiwu Huang
Abstract summary: We introduce a prompt engineering-assisted malware dynamic analysis using GPT-4. In this method, GPT-4 is employed to create explanatory text for each API call within the API sequence. BERT is used to obtain the representation of the text, from which we derive the representation of the API sequence.
Score: 45.935748395725206
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Dynamic analysis methods effectively identify shelled, wrapped, or obfuscated malware, thereby preventing them from invading computers. As a significant representation of dynamic malware behavior, the API (Application Programming Interface) sequence, comprised of consecutive API calls, has progressively become the dominant feature of dynamic analysis methods. Though there have been numerous deep learning models for malware detection based on API sequences, the quality of API call representations produced by those models is limited. These models cannot generate representations for unknown API calls, which weakens both the detection performance and the generalization. Further, the concept drift phenomenon of API calls is prominent. To tackle these issues, we introduce a prompt engineering-assisted malware dynamic analysis using GPT-4. In this method, GPT-4 is employed to create explanatory text for each API call within the API sequence. Afterward, the pre-trained language model BERT is used to obtain the representation of the text, from which we derive the representation of the API sequence. Theoretically, this proposed method is capable of generating representations for all API calls, excluding the necessity for dataset training during the generation process. Utilizing the representation, a CNN-based detection model is designed to extract the feature. We adopt five benchmark datasets to validate the performance of the proposed model. The experimental results reveal that the proposed detection algorithm performs better than the state-of-the-art method (TextCNN). Specifically, in cross-database experiments and few-shot learning experiments, the proposed model achieves excellent detection performance and almost a 100% recall rate for malware, verifying its superior generalization performance. The code is available at: github.com/yan-scnu/Prompted_Dynamic_Detection.

Related papers

Predicting the Performance of Black-box LLMs through Self-Queries [60.87193950962585]
Large language models (LLMs) are increasingly relied on in AI systems, predicting when they make mistakes is crucial. In this paper, we extract features of LLMs in a black-box manner by using follow-up prompts and taking the probabilities of different responses as representations. We demonstrate that training a linear model on these low-dimensional representations produces reliable predictors of model performance at the instance level.
arXiv Detail & Related papers (2025-01-02T22:26:54Z)
A Lean Transformer Model for Dynamic Malware Analysis and Detection [0.0]
Malware is a fast-growing threat to the modern computing world and existing lines of defense are not efficient enough to address this issue. Previous works have shown some success leveraging Neural Networks and API calls sequences extracted from execution reports. In this paper, we design an emulation-Only model, based on the Transformers architecture, to detect malicious files.
arXiv Detail & Related papers (2024-08-05T08:46:46Z)
Mitigating the Impact of Malware Evolution on API Sequence-based Windows Malware Detector [5.953199557879621]
Methods based on API sequences play a crucial role in malware prevention. Evolved malware samples often use the API sequences of the pre-evolution samples to achieve similar malicious behaviors. We propose a frame(MME) framework that can enhance existing API sequence-based malware detectors.
arXiv Detail & Related papers (2024-08-03T04:21:24Z)
FANTAstic SEquences and Where to Find Them: Faithful and Efficient API Call Generation through State-tracked Constrained Decoding and Reranking [57.53742155914176]
API call generation is the cornerstone of large language models' tool-using ability. Existing supervised and in-context learning approaches suffer from high training costs, poor data efficiency, and generated API calls that can be unfaithful to the API documentation and the user's request. We propose an output-side optimization approach called FANTASE to address these limitations.
arXiv Detail & Related papers (2024-07-18T23:44:02Z)
EarlyMalDetect: A Novel Approach for Early Windows Malware Detection Based on Sequences of API Calls [0.7373617024876725]
We propose EarlyMalDetect, a novel approach for early Windows malware detection based on sequences of API calls. EarlyMalDetect can predict and reveal what a malware program is going to perform on the target system before it occurs. Our extensive experimental evaluations show that the proposed approach is highly effective in predicting malware behaviors.
arXiv Detail & Related papers (2024-07-18T09:54:33Z)
A Classification-by-Retrieval Framework for Few-Shot Anomaly Detection to Detect API Injection Attacks [9.693391036125908]
We propose a novel unsupervised few-shot anomaly detection framework composed of two main parts. First, we train a dedicated generic language model for API based on FastText embedding. Next, we use Approximate Nearest Neighbor search in a classification-by-retrieval approach.
arXiv Detail & Related papers (2024-05-18T10:15:31Z)
Adaptive REST API Testing with Reinforcement Learning [54.68542517176757]
Current testing tools lack efficient exploration mechanisms, treating all operations and parameters equally. Current tools struggle when response schemas are absent in the specification or exhibit variants. We present an adaptive REST API testing technique incorporates reinforcement learning to prioritize operations during exploration.
arXiv Detail & Related papers (2023-09-08T20:27:05Z)
Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust. Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model. We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z)
Decoder Tuning: Efficient Language Understanding as Decoding [84.68266271483022]
We present Decoder Tuning (DecT), which in contrast optimize task-specific decoder networks on the output side. By gradient-based optimization, DecT can be trained within several seconds and requires only one P query per sample. We conduct extensive natural language understanding experiments and show that DecT significantly outperforms state-of-the-art algorithms with a $200times$ speed-up.
arXiv Detail & Related papers (2022-12-16T11:15:39Z)
Using sequential drift detection to test the API economy [4.056434158960926]
API economy refers to the widespread integration of API (advanced programming interface) It is desirable to monitor the usage patterns and identify when the system is used in a way that was never used before. In this work we analyze both histograms and call graph of API usage to determine if the usage patterns of the system has shifted.
arXiv Detail & Related papers (2021-11-09T13:24:19Z)
Enhancing the Generalization for Intent Classification and Out-of-Domain Detection in SLU [70.44344060176952]
Intent classification is a major task in spoken language understanding (SLU) Recent works have shown that using extra data and labels can improve the OOD detection performance. This paper proposes to train a model with only IND data while supporting both IND intent classification and OOD detection.
arXiv Detail & Related papers (2021-06-28T08:27:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.