Related papers: MalRAG: A Retrieval-Augmented LLM Framework for Open-set Malicious Traffic Identification

MalRAG: A Retrieval-Augmented LLM Framework for Open-set Malicious Traffic Identification

URL: http://arxiv.org/abs/2511.14129v1
Date: Tue, 18 Nov 2025 04:25:16 GMT
Title: MalRAG: A Retrieval-Augmented LLM Framework for Open-set Malicious Traffic Identification
Authors: Xiang Luo, Chang Liu, Gang Xiong, Chen Yang, Gaopeng Gou, Yaochen Ren, Zhen Li,
Abstract summary: MalRAG is a retrieval-augmented framework for open-set malicious traffic identification.<n>We construct a multi-view traffic database by mining prior malicious traffic from content, structural, and temporal perspectives.<n>We employ Traffic-Aware Adaptive Pruning to select a variable subset of these candidates based on traffic-aware similarity scores.
Score: 15.302665374408553
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fine-grained identification of IDS-flagged suspicious traffic is crucial in cybersecurity. In practice, cyber threats evolve continuously, making the discovery of novel malicious traffic a critical necessity as well as the identification of known classes. Recent studies have advanced this goal with deep models, but they often rely on task-specific architectures that limit transferability and require per-dataset tuning. In this paper we introduce MalRAG, the first LLM driven retrieval-augmented framework for open-set malicious traffic identification. MalRAG freezes the LLM and operates via comprehensive traffic knowledge construction, adaptive retrieval, and prompt engineering. Concretely, we construct a multi-view traffic database by mining prior malicious traffic from content, structural, and temporal perspectives. Furthermore, we introduce a Coverage-Enhanced Retrieval Algorithm that queries across these views to assemble the most probable candidates, thereby improving the inclusion of correct evidence. We then employ Traffic-Aware Adaptive Pruning to select a variable subset of these candidates based on traffic-aware similarity scores, suppressing incorrect matches and yielding reliable retrieved evidence. Moreover, we develop a suite of guidance prompts where task instruction, evidence referencing, and decision guidance are integrated with the retrieved evidence to improve LLM performance. Across diverse real-world datasets and settings, MalRAG delivers state-of-the-art results in both fine-grained identification of known classes and novel malicious traffic discovery. Ablation and deep-dive analyses further show that MalRAG effective leverages LLM capabilities yet achieves open-set malicious traffic identification without relying on a specific LLM.

Related papers

RerouteGuard: Understanding and Mitigating Adversarial Risks for LLM Routing [20.559596977062146]
LLM routers are vulnerable to adversarial attacks in the form of LLM rerouting.<n>We introduce RerouteGuard, a flexible and scalable guardrail framework for LLM rerouting.<n>RerouteGuard achieves over 99% detection accuracy against state-of-the-art rerouting attacks.
arXiv Detail & Related papers (2026-01-29T08:17:08Z)
Virtual Traffic Police: Large Language Model-Augmented Traffic Signal Control for Unforeseen Incidents [5.077053934708947]
We propose a hierarchical framework that augments existing traffic signal control systems with Large Language Models (LLMs)<n>A virtual traffic police agent at the upper level dynamically fine-tunes selected parameters of signal controllers at the lower level in response to real-time traffic incidents.<n>Our results show that LLMs can serve as trustworthy virtual traffic police officers that can adapt conventional TSC methods to unforeseen traffic incidents.
arXiv Detail & Related papers (2026-01-22T10:04:21Z)
Traffic-MLLM: A Spatio-Temporal MLLM with Retrieval-Augmented Generation for Causal Inference in Traffic [8.754321713184483]
We propose Traffic-LM, a multimodal large language model tailored for fine-grained traffic analysis.<n>Our model leverages high-quality traffic-specific multimodal datasets and uses LowRanktemporal Adaptation (LoRA) for lightweight fine-tuning.<n>We also introduce an innovative knowledge module fusing Chain-of-the-art reasoning with Retrieval-Lomented Generation (LoRAG)
arXiv Detail & Related papers (2025-09-14T08:53:06Z)
DeepSieve: Information Sieving via LLM-as-a-Knowledge-Router [57.28685457991806]
DeepSieve is an agentic RAG framework that incorporates information sieving via LLM-as-a-knowledge-router.<n>Our design emphasizes modularity, transparency, and adaptability, leveraging recent advances in agentic system design.
arXiv Detail & Related papers (2025-07-29T17:55:23Z)
Large Language Models powered Malicious Traffic Detection: Architecture, Opportunities and Case Study [12.381768120279771]
Large Language Models (LLMs) are trained on a vast corpus of text.<n>We focus on unleashing the full potential of LLMs in malicious traffic detection.<n>We present our design on LLM-powered DDoS detection as a case study.
arXiv Detail & Related papers (2025-03-24T09:40:46Z)
How Robust Are Router-LLMs? Analysis of the Fragility of LLM Routing Capabilities [62.474732677086855]
Large language model (LLM) routing has emerged as a crucial strategy for balancing computational costs with performance.<n>We propose the DSC benchmark: Diverse, Simple, and Categorized, an evaluation framework that categorizes router performance across a broad spectrum of query types.
arXiv Detail & Related papers (2025-03-20T19:52:30Z)
Strada-LLM: Graph LLM for traffic prediction [62.2015839597764]
A considerable challenge in traffic prediction lies in handling the diverse data distributions caused by vastly different traffic conditions.<n>We propose a graph-aware LLM for traffic prediction that considers proximal traffic information.<n>We adopt a lightweight approach for efficient domain adaptation when facing new data distributions in few-shot fashion.
arXiv Detail & Related papers (2024-10-28T09:19:29Z)
Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via Role Recognition and Involvement Measurement [51.601916604301685]
Large language models (LLMs) generate content that can undermine trust in online discourse.<n>Current methods often focus on binary classification, failing to address the complexities of real-world scenarios like human-LLM collaboration.<n>To move beyond binary classification and address these challenges, we propose a new paradigm for detecting LLM-generated content.
arXiv Detail & Related papers (2024-10-18T08:14:10Z)
Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses. Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives. The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z)
LLMLight: Large Language Models as Traffic Signal Control Agents [25.438040499152745]
Traffic Signal Control (TSC) is a crucial component in urban traffic management, aiming to optimize road network efficiency and reduce congestion.<n>This paper presents LLMLight, a novel framework employing Large Language Models (LLMs) as decision-making agents for TSC.
arXiv Detail & Related papers (2023-12-26T13:17:06Z)
A Survey on Detection of LLMs-Generated Content [97.87912800179531]
The ability to detect LLMs-generated content has become of paramount importance. We aim to provide a detailed overview of existing detection strategies and benchmarks. We also posit the necessity for a multi-faceted approach to defend against various attacks.
arXiv Detail & Related papers (2023-10-24T09:10:26Z)
Machine Learning for Encrypted Malicious Traffic Detection: Approaches, Datasets and Comparative Study [6.267890584151111]
In post-COVID-19 environment, malicious traffic encryption is growing rapidly. We formulate a universal framework of machine learning based encrypted malicious traffic detection techniques. We implement and compare 10 encrypted malicious traffic detection algorithms.
arXiv Detail & Related papers (2022-03-17T14:00:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.