Related papers: TraceRAG: A LLM-Based Framework for Explainable Android Malware Detection and Behavior Analysis

TraceRAG: A LLM-Based Framework for Explainable Android Malware Detection and Behavior Analysis

URL: http://arxiv.org/abs/2509.08865v1
Date: Wed, 10 Sep 2025 06:07:12 GMT
Title: TraceRAG: A LLM-Based Framework for Explainable Android Malware Detection and Behavior Analysis
Authors: Guangyu Zhang, Xixuan Wang, Shiyu Sun, Peiyan Xiao, Kun Sun, Yanhai Xiong,
Abstract summary: We introduce TraceRAG, a retrieval-augmented generation (RAG) framework to deliver explainable malware detection and analysis.<n>First, TraceRAG generates summaries of method-level code snippets, which are indexed in a vector database.<n>At query time, behavior-focused questions retrieve the most semantically relevant snippets for deeper inspection.<n>Finally, based on the multi-turn analysis results, TraceRAG produces human-readable reports that present the identified malicious behaviors and their corresponding code implementations.
Score: 8.977634735108895
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sophisticated evasion tactics in malicious Android applications, combined with their intricate behavioral semantics, enable attackers to conceal malicious logic within legitimate functions, underscoring the critical need for robust and in-depth analysis frameworks. However, traditional analysis techniques often fail to recover deeply hidden behaviors or provide human-readable justifications for their decisions. Inspired by advances in large language models (LLMs), we introduce TraceRAG, a retrieval-augmented generation (RAG) framework that bridges natural language queries and Java code to deliver explainable malware detection and analysis. First, TraceRAG generates summaries of method-level code snippets, which are indexed in a vector database. At query time, behavior-focused questions retrieve the most semantically relevant snippets for deeper inspection. Finally, based on the multi-turn analysis results, TraceRAG produces human-readable reports that present the identified malicious behaviors and their corresponding code implementations. Experimental results demonstrate that our method achieves 96\% malware detection accuracy and 83.81\% behavior identification accuracy based on updated VirusTotal (VT) scans and manual verification. Furthermore, expert evaluation confirms the practical utility of the reports generated by TraceRAG.

Related papers

An Explainable Memory Forensics Approach for Malware Analysis [1.2744523252873352]
Memory forensics is an effective methodology for analyzing living-off-the-land malware.<n>In this paper, we propose an explainable, AI-assisted memory forensics approach.<n>We apply the proposed methodology to both Windows and Android malware.
arXiv Detail & Related papers (2026-02-23T13:30:04Z)
Efficient Code Analysis via Graph-Guided Large Language Models [14.569998138597393]
We propose a graph-centric attention acquisition pipeline that enhances Large Language Models' ability to localize malicious behavior.<n>The approach parses a project into a code graph, uses an LLM to encode nodes with semantic and structural signals, and trains a Graph Neural Network (GNN) under sparse supervision.
arXiv Detail & Related papers (2026-01-19T09:42:00Z)
CoG: Controllable Graph Reasoning via Relational Blueprints and Failure-Aware Refinement over Knowledge Graphs [53.199517625701475]
CoG is a training-free framework inspired by Dual-Process Theory that mimics the interplay between intuition and deliberation.<n>CoG significantly outperforms state-of-the-art approaches in both accuracy and efficiency.
arXiv Detail & Related papers (2026-01-16T07:27:40Z)
The Double Life of Code World Models: Provably Unmasking Malicious Behavior Through Execution Traces [0.0]
Large language models (LLMs) increasingly generate code with minimal human oversight.<n>We present a novel AI control framework that verifies untrusted code-generating models through semantic analysis.
arXiv Detail & Related papers (2025-12-15T19:05:37Z)
The Trojan Knowledge: Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search [58.8834056209347]
Large language models (LLMs) remain vulnerable to jailbreak attacks that bypass safety guardrails to elicit harmful outputs.<n>We introduce the Correlated Knowledge Attack Agent (CKA-Agent), a dynamic framework that reframes jailbreaking as an adaptive, tree-structured exploration of the target model's knowledge base.
arXiv Detail & Related papers (2025-12-01T07:05:23Z)
InspectCoder: Dynamic Analysis-Enabled Self Repair through interactive LLM-Debugger Collaboration [71.18377595277018]
Large Language Models (LLMs) frequently generate buggy code with complex logic errors that are challenging to diagnose.<n>We present InspectCoder, the first agentic program repair system that empowers LLMs to actively conduct dynamic analysis via interactive debugger control.
arXiv Detail & Related papers (2025-10-21T06:26:29Z)
Demystifying deep search: a holistic evaluation with hint-free multi-hop questions and factorised metrics [89.1999907891494]
We present WebDetective, a benchmark of hint-free multi-hop questions paired with a controlled Wikipedia sandbox.<n>Our evaluation of 25 state-of-the-art models reveals systematic weaknesses across all architectures.<n>We develop an agentic workflow, EvidenceLoop, that explicitly targets the challenges our benchmark identifies.
arXiv Detail & Related papers (2025-10-01T07:59:03Z)
Where Did It Go Wrong? Attributing Undesirable LLM Behaviors via Representation Gradient Tracing [12.835224376066769]
Large Language Models (LLMs) have demonstrated remarkable capabilities, yet their deployment is frequently undermined by undesirable behaviors.<n>We introduce a novel and efficient framework that diagnoses a range of undesirable LLM behaviors by analyzing representation and its gradients.<n>We systematically evaluate our method for tasks that include tracking harmful content, detecting backdoor poisoning, and identifying knowledge contamination.
arXiv Detail & Related papers (2025-09-26T12:07:47Z)
MirGuard: Towards a Robust Provenance-based Intrusion Detection System Against Graph Manipulation Attacks [13.92935628832727]
MirGuard is an anomaly detection framework that combines logic-aware multi-view augmentation with contrastive representation learning.<n>MirGuard significantly outperforms state-of-the-art detectors in robustness against various graph manipulation attacks.
arXiv Detail & Related papers (2025-08-14T13:35:51Z)
Certifiably robust malware detectors by design [48.367676529300276]
We propose a new model architecture for robust malware detection by design.<n>We show that every robust detector can be decomposed into a specific structure, which can be applied to learn empirically robust malware detectors.<n>Our framework ERDALT is based on this structure.
arXiv Detail & Related papers (2025-08-10T09:19:29Z)
OmniAD: Detect and Understand Industrial Anomaly via Multimodal Reasoning [76.90511414963265]
We introduce OmniAD, a framework that unifies anomaly detection and understanding for fine-grained analysis.<n>Visual reasoning provides detailed inspection by leveraging Text-as-Mask.<n>Visual Guided Textual Reasoning conducts comprehensive analysis by integrating visual perception.
arXiv Detail & Related papers (2025-05-28T07:02:15Z)
EXPLICATE: Enhancing Phishing Detection through Explainable AI and LLM-Powered Interpretability [44.2907457629342]
EXPLICATE is a framework that enhances phishing detection through a three-component architecture.<n>It is on par with existing deep learning techniques but has better explainability.<n>It addresses the critical divide between automated AI and user trust in phishing detection systems.
arXiv Detail & Related papers (2025-03-22T23:37:35Z)
MASKDROID: Robust Android Malware Detection with Masked Graph Representations [56.09270390096083]
We propose MASKDROID, a powerful detector with a strong discriminative ability to identify malware. We introduce a masking mechanism into the Graph Neural Network based framework, forcing MASKDROID to recover the whole input graph. This strategy enables the model to understand the malicious semantics and learn more stable representations, enhancing its robustness against adversarial attacks.
arXiv Detail & Related papers (2024-09-29T07:22:47Z)
AppPoet: Large Language Model based Android malware detection via multi-view prompt engineering [1.3197408989895103]
AppPoet is a multi-view system for Android malware detection. Our method achieves a detection accuracy of 97.15% and an F1 score of 97.21%, which is superior to the baseline methods.
arXiv Detail & Related papers (2024-04-29T15:52:45Z)
SliceLocator: Locating Vulnerable Statements with Graph-based Detectors [33.395068754566935]
SliceLocator identifies the most relevant taint flow by selecting the highest-weighted flow path from all potential vulnerability-triggering statements.<n>We demonstrate that SliceLocator consistently performs well on four state-of-the-art GNN-based vulnerability detectors.
arXiv Detail & Related papers (2024-01-05T10:15:04Z)
OUTFOX: LLM-Generated Essay Detection Through In-Context Learning with Adversarially Generated Examples [44.118047780553006]
OUTFOX is a framework that improves the robustness of LLM-generated-text detectors by allowing both the detector and the attacker to consider each other's output. Experiments show that the proposed detector improves the detection performance on the attacker-generated texts by up to +41.3 points F1-score. The detector shows a state-of-the-art detection performance: up to 96.9 points F1-score, beating existing detectors on non-attacked texts.
arXiv Detail & Related papers (2023-07-21T17:40:47Z)
Malicious Code Detection: Run Trace Output Analysis by LSTM [0.0]
We propose a methodological framework for detecting malicious code by analyzing run trace outputs by Long Short-Term Memory (LSTM) We created our dataset from run trace outputs obtained from dynamic analysis of PE files. Experiments showed that the ISM achieved an accuracy of 87.51% and a false positive rate of 18.34%, while BSM achieved an accuracy of 99.26% and a false positive rate of 2.62%.
arXiv Detail & Related papers (2021-01-14T15:00:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.