Related papers: RHINO: Guided Reasoning for Mapping Network Logs to Adversarial Tactics and Techniques with Large Language Models

RHINO: Guided Reasoning for Mapping Network Logs to Adversarial Tactics and Techniques with Large Language Models

URL: http://arxiv.org/abs/2510.14233v1
Date: Thu, 16 Oct 2025 02:25:46 GMT
Title: RHINO: Guided Reasoning for Mapping Network Logs to Adversarial Tactics and Techniques with Large Language Models
Authors: Fanchao Meng, Jiaping Gui, Yunbo Li, Yue Wu,
Abstract summary: We introduce RHINO, a framework that decomposes Large Language Models into three interpretable phases mirroring human reasoning.<n> RHINO bridges the semantic gap between low-level observations and adversarial intent while improving output reliability through structured reasoning.<n>Our results demonstrate that RHINO significantly enhances the interpretability and scalability of threat analysis, offering a blueprint for deploying LLMs in operational security settings.
Score: 9.065322387043546
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Modern Network Intrusion Detection Systems generate vast volumes of low-level alerts, yet these outputs remain semantically fragmented, requiring labor-intensive manual correlation with high-level adversarial behaviors. Existing solutions for automating this mapping-rule-based systems and machine learning classifiers-suffer from critical limitations: rule-based approaches fail to adapt to novel attack variations, while machine learning methods lack contextual awareness and treat tactic-technique mapping as a syntactic matching problem rather than a reasoning task. Although Large Language Models have shown promise in cybersecurity tasks, preliminary experiments reveal that existing LLM-based methods frequently hallucinate technique names or produce decontextualized mappings due to their single-step classification approach. To address these challenges, we introduce RHINO, a novel framework that decomposes LLM-based attack analysis into three interpretable phases mirroring human reasoning: (1) behavioral abstraction, where raw logs are translated into contextualized narratives; (2) multi-role collaborative inference, generating candidate techniques by evaluating behavioral evidence against MITRE ATT&CK knowledge; and (3) validation, cross-referencing predictions with official MITRE definitions to rectify hallucinations. RHINO bridges the semantic gap between low-level observations and adversarial intent while improving output reliability through structured reasoning. We evaluate RHINO on three benchmarks across four backbone models. RHINO achieved high accuracy, with model performance ranging from 86.38% to 88.45%, resulting in relative gains from 24.25% to 76.50% across different models. Our results demonstrate that RHINO significantly enhances the interpretability and scalability of threat analysis, offering a blueprint for deploying LLMs in operational security settings.

Related papers

LLM-based Multi-class Attack Analysis and Mitigation Framework in IoT/IIoT Networks [0.42056926734482064]
We propose a hybrid framework combining Machine Learning for attack detection with Large Language Models (LLMs) for attack behavior analysis and mitigation suggestion.<n>We introduce novel evaluation metrics for quantitative assessment to guide us and an ensemble of judge LLMs, namely ChatGPT-4o, DeepSeek-V3, Mixtral 8x7B Instruct, Gemini 2.5 Flash, Meta Llama 4, TII Falcon H1 34B Instruct, xAI Grok 3, and Claude 4 Sonnet.<n>Results show that Random Forest has the best detection model, and ChatGPT-o3 outperformed DeepSeek-R1 in attack analysis and
arXiv Detail & Related papers (2025-10-30T18:55:08Z)
Fault Cause Identification across Manufacturing Lines through Ontology-Guided and Process-Aware FMEA Graph Learning with LLMs [1.9563024477582351]
This study proposes a process-aware framework that enhances FMEA reusability by combining manufacturing-domain conceptualization with graph neural network (GNN) reasoning.<n>A case study on automotive pressure sensor assembly lines demonstrates that the proposed method outperforms a state-of-the-art retrieval-augmented generation (RAG) baseline.
arXiv Detail & Related papers (2025-10-17T08:35:47Z)
Human Texts Are Outliers: Detecting LLM-generated Texts via Out-of-distribution Detection [71.59834293521074]
We develop a framework to distinguish between human-authored and machine-generated text.<n>Our method achieves 98.3% AUROC and AUPR with only 8.9% FPR95 on DeepFake dataset.<n>Code, pretrained weights, and demo will be released.
arXiv Detail & Related papers (2025-10-07T08:14:45Z)
Modeling the Attack: Detecting AI-Generated Text by Quantifying Adversarial Perturbations [2.7620215077666557]
Modern detectors are notoriously vulnerable to adversarial attacks, with paraphrasing standing out as an effective evasion technique.<n>This paper presents a comparative study of adversarial robustness, first by quantifying the limitations of standard adversarial training.<n>We then introduce a novel, significantly more resilient detection framework: Perturbation-Invariant Feature Engineering.
arXiv Detail & Related papers (2025-09-22T13:03:53Z)
Learning Efficient and Generalizable Graph Retriever for Knowledge-Graph Question Answering [75.12322966980003]
Large Language Models (LLMs) have shown strong inductive reasoning ability across various domains.<n>Most existing RAG pipelines rely on unstructured text, limiting interpretability and structured reasoning.<n>Recent studies have explored integrating knowledge graphs with LLMs for knowledge graph question answering.<n>We propose RAPL, a novel framework for efficient and effective graph retrieval in KGQA.
arXiv Detail & Related papers (2025-06-11T12:03:52Z)
Detection Method for Prompt Injection by Integrating Pre-trained Model and Heuristic Feature Engineering [3.0823377252469144]
prompt injection attacks have emerged as a significant security threat.<n>Existing defense mechanisms face trade-offs between effectiveness and generalizability.<n>We propose a dual-channel feature fusion detection framework.
arXiv Detail & Related papers (2025-06-05T06:01:19Z)
Investigating the Vulnerability of LLM-as-a-Judge Architectures to Prompt-Injection Attacks [0.0]
Large Language Models (LLMs) are increasingly employed as evaluators (LLM-as-a-Judge) for assessing the quality of machine-generated text.<n>This paper investigates the vulnerability of LLM-as-a-Judge architectures to prompt-injection attacks.
arXiv Detail & Related papers (2025-05-19T16:51:12Z)
Towards General Visual-Linguistic Face Forgery Detection(V2) [90.6600794602029]
Face manipulation techniques have achieved significant advances, presenting serious challenges to security and social trust.<n>Recent works demonstrate that leveraging multimodal models can enhance the generalization and interpretability of face forgery detection.<n>We propose Face Forgery Text Generator (FFTG), a novel annotation pipeline that generates accurate text descriptions by leveraging forgery masks for initial region and type identification.
arXiv Detail & Related papers (2025-02-28T04:15:36Z)
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.<n>We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.<n>Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z)
RDR: the Recap, Deliberate, and Respond Method for Enhanced Language Understanding [6.738409533239947]
The Recap, Deliberate, and Respond (RDR) paradigm addresses this issue by incorporating three distinct objectives within the neural network pipeline. By cascading these three models, we mitigate the potential for gaming the benchmark and establish a robust method for capturing the underlying semantic patterns. Our results demonstrate improved performance compared to competitive baselines, with an enhancement of up to 2% on standard metrics.
arXiv Detail & Related papers (2023-12-15T16:41:48Z)
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction. The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses. LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z)
Semantic Image Attack for Visual Model Diagnosis [80.36063332820568]
In practice, metric analysis on a specific train and test dataset does not guarantee reliable or fair ML models. This paper proposes Semantic Image Attack (SIA), a method based on the adversarial attack that provides semantic adversarial images.
arXiv Detail & Related papers (2023-03-23T03:13:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.