Related papers: Exploring Large Language Models for Semantic Analysis and Categorization of Android Malware

Exploring Large Language Models for Semantic Analysis and Categorization of Android Malware

URL: http://arxiv.org/abs/2501.04848v1
Date: Wed, 08 Jan 2025 21:22:45 GMT
Title: Exploring Large Language Models for Semantic Analysis and Categorization of Android Malware
Authors: Brandon J Walton, Mst Eshita Khatun, James M Ghawaly, Aisha Ali-Gombe,
Abstract summary: msp is designed to augment malware analysis for Android through a hierarchical-tiered summarization chain and strategic prompt engineering.<n>msp can achieve up to 77% classification accuracy while providing highly robust summaries at functional, class, and package levels.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Malware analysis is a complex process of examining and evaluating malicious software's functionality, origin, and potential impact. This arduous process typically involves dissecting the software to understand its components, infection vector, propagation mechanism, and payload. Over the years, deep reverse engineering of malware has become increasingly tedious, mainly due to modern malicious codebases' fast evolution and sophistication. Essentially, analysts are tasked with identifying the elusive needle in the haystack within the complexities of zero-day malware, all while under tight time constraints. Thus, in this paper, we explore leveraging Large Language Models (LLMs) for semantic malware analysis to expedite the analysis of known and novel samples. Built on GPT-4o-mini model, \msp is designed to augment malware analysis for Android through a hierarchical-tiered summarization chain and strategic prompt engineering. Additionally, \msp performs malware categorization, distinguishing potential malware from benign applications, thereby saving time during the malware reverse engineering process. Despite not being fine-tuned for Android malware analysis, we demonstrate that through optimized and advanced prompt engineering \msp can achieve up to 77% classification accuracy while providing highly robust summaries at functional, class, and package levels. In addition, leveraging the backward tracing of the summaries from package to function levels allowed us to pinpoint the precise code snippets responsible for malicious behavior.

Related papers

MaLAware: Automating the Comprehension of Malicious Software Behaviours using Large Language Models (LLMs) [3.410195565199523]
MaLAware is a tool that translates raw malware data into human-readable descriptions. MaLAware processes Cuckoo Sandbox-generated reports to correlate malignant activities and generate concise summaries. The evaluation uses the human-written malware behaviour description dataset as ground truth.
arXiv Detail & Related papers (2025-04-01T19:27:17Z)
On Benchmarking Code LLMs for Android Malware Analysis [13.932151152280689]
Large Language Models (LLMs) have demonstrated strong capabilities in various code intelligence tasks. This paper presents CAMA, a benchmarking framework designed to evaluate the effectiveness of Code LLMs in Android malware analysis.
arXiv Detail & Related papers (2025-04-01T12:05:49Z)
MASKDROID: Robust Android Malware Detection with Masked Graph Representations [56.09270390096083]
We propose MASKDROID, a powerful detector with a strong discriminative ability to identify malware. We introduce a masking mechanism into the Graph Neural Network based framework, forcing MASKDROID to recover the whole input graph. This strategy enables the model to understand the malicious semantics and learn more stable representations, enhancing its robustness against adversarial attacks.
arXiv Detail & Related papers (2024-09-29T07:22:47Z)
Explainable Malware Analysis: Concepts, Approaches and Challenges [0.0]
We review the current state-of-the-art ML-based malware detection techniques and popular XAI approaches. We discuss research implementations and the challenges of explainable malware analysis. This theoretical survey serves as an entry point for researchers interested in XAI applications in malware detection.
arXiv Detail & Related papers (2024-09-09T08:19:33Z)
A Lean Transformer Model for Dynamic Malware Analysis and Detection [0.0]
Malware is a fast-growing threat to the modern computing world and existing lines of defense are not efficient enough to address this issue. Previous works have shown some success leveraging Neural Networks and API calls sequences extracted from execution reports. In this paper, we design an emulation-Only model, based on the Transformers architecture, to detect malicious files.
arXiv Detail & Related papers (2024-08-05T08:46:46Z)
A Wolf in Sheep's Clothing: Practical Black-box Adversarial Attacks for Evading Learning-based Windows Malware Detection in the Wild [39.28931186940845]
MalGuise is a black-box adversarial attack framework that evaluates the security risks of existing learning-based Windows malware detection systems. MalGuise achieves a remarkably high attack success rate, mostly exceeding 95%, with over 91% of the generated adversarial malware files maintaining the same semantics.
arXiv Detail & Related papers (2024-07-03T08:01:19Z)
Unraveling the Key of Machine Learning Solutions for Android Malware Detection [33.63795751798441]
This paper presents a comprehensive investigation into machine learning-based Android malware detection. We first survey the literature, categorizing contributions into a taxonomy based on the Android feature engineering and ML modeling pipeline. Then, we design a general-propose framework for ML-based Android malware detection, re-implement 12 representative approaches from different research communities, and evaluate them from three primary dimensions, i.e. effectiveness, robustness, and efficiency.
arXiv Detail & Related papers (2024-02-05T12:31:19Z)
Light up that Droid! On the Effectiveness of Static Analysis Features against App Obfuscation for Android Malware Detection [42.50353398405467]
Malware authors have seen obfuscation as the mean to bypass malware detectors based on static analysis features. In this article we assess the impact of specific obfuscation techniques on common features extracted using static analysis. We propose a ML malware detector for Android that is robust against obfuscation and outperforms current state-of-the-art detectors.
arXiv Detail & Related papers (2023-10-24T09:07:23Z)
A survey on hardware-based malware detection approaches [45.24207460381396]
Hardware-based malware detection approaches leverage hardware performance counters and machine learning prowess. We meticulously analyze the approach, unraveling the most common methods, algorithms, tools, and datasets that shape its contours. The discussion extends to crafting mixed hardware and software approaches for collaborative efficacy, essential enhancements in hardware monitoring units, and a better understanding of the correlation between hardware events and malware applications.
arXiv Detail & Related papers (2023-03-22T13:00:41Z)
DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified Robustness [58.23214712926585]
We develop a certified defense, DRSM (De-Randomized Smoothed MalConv), by redesigning the de-randomized smoothing technique for the domain of malware detection. Specifically, we propose a window ablation scheme to provably limit the impact of adversarial bytes while maximally preserving local structures of the executables. We are the first to offer certified robustness in the realm of static detection of malware executables.
arXiv Detail & Related papers (2023-03-20T17:25:22Z)
Adversarial Patterns: Building Robust Android Malware Classifiers [0.9208007322096533]
In the field of cybersecurity, machine learning models have made significant improvements in malware detection. Despite their ability to understand complex patterns from unstructured data, these models are susceptible to adversarial attacks. This paper provides a comprehensive review of adversarial machine learning in the context of Android malware classifiers.
arXiv Detail & Related papers (2022-03-04T03:47:08Z)
Towards an Automated Pipeline for Detecting and Classifying Malware through Machine Learning [0.0]
We propose a malware taxonomic classification pipeline able to classify Windows Portable Executable files (PEs) Given an input PE sample, it is first classified as either malicious or benign. If malicious, the pipeline further analyzes it in order to establish its threat type, family, and behavior(s)
arXiv Detail & Related papers (2021-06-10T10:07:50Z)
Being Single Has Benefits. Instance Poisoning to Deceive Malware Classifiers [47.828297621738265]
We show how an attacker can launch a sophisticated and efficient poisoning attack targeting the dataset used to train a malware classifier. As opposed to other poisoning attacks in the malware detection domain, our attack does not focus on malware families but rather on specific malware instances that contain an implanted trigger. We propose a comprehensive detection approach that could serve as a future sophisticated defense against this newly discovered severe threat.
arXiv Detail & Related papers (2020-10-30T15:27:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.