MalRAG: A Retrieval-Augmented LLM Framework for Open-set Malicious Traffic Identification
- URL: http://arxiv.org/abs/2511.14129v1
- Date: Tue, 18 Nov 2025 04:25:16 GMT
- Title: MalRAG: A Retrieval-Augmented LLM Framework for Open-set Malicious Traffic Identification
- Authors: Xiang Luo, Chang Liu, Gang Xiong, Chen Yang, Gaopeng Gou, Yaochen Ren, Zhen Li,
- Abstract summary: MalRAG is a retrieval-augmented framework for open-set malicious traffic identification.<n>We construct a multi-view traffic database by mining prior malicious traffic from content, structural, and temporal perspectives.<n>We employ Traffic-Aware Adaptive Pruning to select a variable subset of these candidates based on traffic-aware similarity scores.
- Score: 15.302665374408553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-grained identification of IDS-flagged suspicious traffic is crucial in cybersecurity. In practice, cyber threats evolve continuously, making the discovery of novel malicious traffic a critical necessity as well as the identification of known classes. Recent studies have advanced this goal with deep models, but they often rely on task-specific architectures that limit transferability and require per-dataset tuning. In this paper we introduce MalRAG, the first LLM driven retrieval-augmented framework for open-set malicious traffic identification. MalRAG freezes the LLM and operates via comprehensive traffic knowledge construction, adaptive retrieval, and prompt engineering. Concretely, we construct a multi-view traffic database by mining prior malicious traffic from content, structural, and temporal perspectives. Furthermore, we introduce a Coverage-Enhanced Retrieval Algorithm that queries across these views to assemble the most probable candidates, thereby improving the inclusion of correct evidence. We then employ Traffic-Aware Adaptive Pruning to select a variable subset of these candidates based on traffic-aware similarity scores, suppressing incorrect matches and yielding reliable retrieved evidence. Moreover, we develop a suite of guidance prompts where task instruction, evidence referencing, and decision guidance are integrated with the retrieved evidence to improve LLM performance. Across diverse real-world datasets and settings, MalRAG delivers state-of-the-art results in both fine-grained identification of known classes and novel malicious traffic discovery. Ablation and deep-dive analyses further show that MalRAG effective leverages LLM capabilities yet achieves open-set malicious traffic identification without relying on a specific LLM.
Related papers
- RerouteGuard: Understanding and Mitigating Adversarial Risks for LLM Routing [20.559596977062146]
LLM routers are vulnerable to adversarial attacks in the form of LLM rerouting.<n>We introduce RerouteGuard, a flexible and scalable guardrail framework for LLM rerouting.<n>RerouteGuard achieves over 99% detection accuracy against state-of-the-art rerouting attacks.
arXiv Detail & Related papers (2026-01-29T08:17:08Z) - Virtual Traffic Police: Large Language Model-Augmented Traffic Signal Control for Unforeseen Incidents [5.077053934708947]
We propose a hierarchical framework that augments existing traffic signal control systems with Large Language Models (LLMs)<n>A virtual traffic police agent at the upper level dynamically fine-tunes selected parameters of signal controllers at the lower level in response to real-time traffic incidents.<n>Our results show that LLMs can serve as trustworthy virtual traffic police officers that can adapt conventional TSC methods to unforeseen traffic incidents.
arXiv Detail & Related papers (2026-01-22T10:04:21Z) - Traffic-MLLM: A Spatio-Temporal MLLM with Retrieval-Augmented Generation for Causal Inference in Traffic [8.754321713184483]
We propose Traffic-LM, a multimodal large language model tailored for fine-grained traffic analysis.<n>Our model leverages high-quality traffic-specific multimodal datasets and uses LowRanktemporal Adaptation (LoRA) for lightweight fine-tuning.<n>We also introduce an innovative knowledge module fusing Chain-of-the-art reasoning with Retrieval-Lomented Generation (LoRAG)
arXiv Detail & Related papers (2025-09-14T08:53:06Z) - DeepSieve: Information Sieving via LLM-as-a-Knowledge-Router [57.28685457991806]
DeepSieve is an agentic RAG framework that incorporates information sieving via LLM-as-a-knowledge-router.<n>Our design emphasizes modularity, transparency, and adaptability, leveraging recent advances in agentic system design.
arXiv Detail & Related papers (2025-07-29T17:55:23Z) - Large Language Models powered Malicious Traffic Detection: Architecture, Opportunities and Case Study [12.381768120279771]
Large Language Models (LLMs) are trained on a vast corpus of text.<n>We focus on unleashing the full potential of LLMs in malicious traffic detection.<n>We present our design on LLM-powered DDoS detection as a case study.
arXiv Detail & Related papers (2025-03-24T09:40:46Z) - How Robust Are Router-LLMs? Analysis of the Fragility of LLM Routing Capabilities [62.474732677086855]
Large language model (LLM) routing has emerged as a crucial strategy for balancing computational costs with performance.<n>We propose the DSC benchmark: Diverse, Simple, and Categorized, an evaluation framework that categorizes router performance across a broad spectrum of query types.
arXiv Detail & Related papers (2025-03-20T19:52:30Z) - Strada-LLM: Graph LLM for traffic prediction [62.2015839597764]
A considerable challenge in traffic prediction lies in handling the diverse data distributions caused by vastly different traffic conditions.<n>We propose a graph-aware LLM for traffic prediction that considers proximal traffic information.<n>We adopt a lightweight approach for efficient domain adaptation when facing new data distributions in few-shot fashion.
arXiv Detail & Related papers (2024-10-28T09:19:29Z) - Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via Role Recognition and Involvement Measurement [51.601916604301685]
Large language models (LLMs) generate content that can undermine trust in online discourse.<n>Current methods often focus on binary classification, failing to address the complexities of real-world scenarios like human-LLM collaboration.<n>To move beyond binary classification and address these challenges, we propose a new paradigm for detecting LLM-generated content.
arXiv Detail & Related papers (2024-10-18T08:14:10Z) - Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses.
Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives.
The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z) - LLMLight: Large Language Models as Traffic Signal Control Agents [25.438040499152745]
Traffic Signal Control (TSC) is a crucial component in urban traffic management, aiming to optimize road network efficiency and reduce congestion.<n>This paper presents LLMLight, a novel framework employing Large Language Models (LLMs) as decision-making agents for TSC.
arXiv Detail & Related papers (2023-12-26T13:17:06Z) - A Survey on Detection of LLMs-Generated Content [97.87912800179531]
The ability to detect LLMs-generated content has become of paramount importance.
We aim to provide a detailed overview of existing detection strategies and benchmarks.
We also posit the necessity for a multi-faceted approach to defend against various attacks.
arXiv Detail & Related papers (2023-10-24T09:10:26Z) - Machine Learning for Encrypted Malicious Traffic Detection: Approaches,
Datasets and Comparative Study [6.267890584151111]
In post-COVID-19 environment, malicious traffic encryption is growing rapidly.
We formulate a universal framework of machine learning based encrypted malicious traffic detection techniques.
We implement and compare 10 encrypted malicious traffic detection algorithms.
arXiv Detail & Related papers (2022-03-17T14:00:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.