OCR-APT: Reconstructing APT Stories from Audit Logs using Subgraph Anomaly Detection and LLMs
- URL: http://arxiv.org/abs/2510.15188v2
- Date: Mon, 20 Oct 2025 12:03:37 GMT
- Title: OCR-APT: Reconstructing APT Stories from Audit Logs using Subgraph Anomaly Detection and LLMs
- Authors: Ahmed Aly, Essam Mansour, Amr Youssef,
- Abstract summary: Advanced Persistent Threats (APTs) are stealthy cyberattacks that often evade detection in system-level audit logs.<n>Existing systems apply anomaly detection to these graphs but often suffer from high false positive rates and coarse-grained alerts.<n>We introduce OCR-APT, a system for APT detection and reconstruction of human-like attack stories.
- Score: 4.663916214040153
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Advanced Persistent Threats (APTs) are stealthy cyberattacks that often evade detection in system-level audit logs. Provenance graphs model these logs as connected entities and events, revealing relationships that are missed by linear log representations. Existing systems apply anomaly detection to these graphs but often suffer from high false positive rates and coarse-grained alerts. Their reliance on node attributes like file paths or IPs leads to spurious correlations, reducing detection robustness and reliability. To fully understand an attack's progression and impact, security analysts need systems that can generate accurate, human-like narratives of the entire attack. To address these challenges, we introduce OCR-APT, a system for APT detection and reconstruction of human-like attack stories. OCR-APT uses Graph Neural Networks (GNNs) for subgraph anomaly detection, learning behavior patterns around nodes rather than fragile attributes such as file paths or IPs. This approach leads to a more robust anomaly detection. It then iterates over detected subgraphs using Large Language Models (LLMs) to reconstruct multi-stage attack stories. Each stage is validated before proceeding, reducing hallucinations and ensuring an interpretable final report. Our evaluations on the DARPA TC3, OpTC, and NODLINK datasets show that OCR-APT outperforms state-of-the-art systems in both detection accuracy and alert interpretability. Moreover, OCR-APT reconstructs human-like reports that comprehensively capture the attack story.
Related papers
- RPG-AE: Neuro-Symbolic Graph Autoencoders with Rare Pattern Mining for Provenance-Based Anomaly Detection [0.8373057326694192]
This paper presents a neuro-symbolic anomaly detection framework that combines a Graph Autoencoder with rare pattern mining.<n>Anomaly candidates are identified through deviations between observed and reconstructed graph structure.<n>We evaluate the proposed method on the DARPA Transparent Computing datasets and show that rare-pattern boosting yields substantial gains in anomaly ranking quality.
arXiv Detail & Related papers (2026-02-03T00:02:37Z) - Semantic-Aware Advanced Persistent Threat Detection Using Autoencoders on LLM-Encoded System Logs [0.7611870296994722]
Advanced Persistent Threats (APTs) are among the most challenging cyberattacks to detect.<n>Traditional statistical methods and shallow machine learning techniques often fail to detect them.<n>This paper proposes a novel anomaly detection approach that leverages semantic embeddings.
arXiv Detail & Related papers (2026-01-30T12:38:12Z) - AutoGraphAD: A novel approach using Variational Graph Autoencoders for anomalous network flow detection [2.4159082914715495]
AutoGraphAD is an unsupervised anomaly detection approach based on a Heterogeneous Variational Graph Autoencoder.<n>It operates on heterogeneous graphs, made from connection and IP nodes that capture network activity within a time window.<n>It achieves around 1.18 orders of magnitude faster training and 1.03 orders of magnitude faster inference.
arXiv Detail & Related papers (2025-11-21T10:22:00Z) - IDGraphs: Intrusion Detection and Analysis Using Stream Compositing [8.0129134921247]
IDGraphs is an interactive visualization system for intrusion detection.<n>We apply IDGraphs to a real network router data-set with 179M flow-level records representing a total traffic of 1.16TB.<n>The system successfully detects and analyzes a variety of attacks and anomalies.
arXiv Detail & Related papers (2025-06-26T16:08:20Z) - CONTINUUM: Detecting APT Attacks through Spatial-Temporal Graph Neural Networks [0.9553673944187253]
Advanced Persistent Threats (APTs) represent a significant challenge in cybersecurity.<n>Traditional Intrusion Detection Systems (IDS) often fall short in detecting these multi-stage attacks.
arXiv Detail & Related papers (2025-01-06T12:43:59Z) - STATGRAPH: Effective In-vehicle Intrusion Detection via Multi-view Statistical Graph Learning [8.494964689206432]
STATGRAPH is an effective and fine-grained intrusion detection methodology for in-vehicle network (IVN) security services.<n>It generates two statistical graphs, timing correlation graph (TCG) and coupling relationship graph (CRG), in every CAN message detection window.<n>It learns the universal laws of various patterns more effectively and further enhance the performance of detection.
arXiv Detail & Related papers (2023-11-13T03:49:55Z) - GLAD: Content-aware Dynamic Graphs For Log Anomaly Detection [49.9884374409624]
GLAD is a Graph-based Log Anomaly Detection framework designed to detect anomalies in system logs.
We introduce GLAD, a Graph-based Log Anomaly Detection framework designed to detect anomalies in system logs.
arXiv Detail & Related papers (2023-09-12T04:21:30Z) - Semi-Supervised and Long-Tailed Object Detection with CascadeMatch [91.86787064083012]
We propose a novel pseudo-labeling-based detector called CascadeMatch.
Our detector features a cascade network architecture, which has multi-stage detection heads with progressive confidence thresholds.
We show that CascadeMatch surpasses existing state-of-the-art semi-supervised approaches in handling long-tailed object detection.
arXiv Detail & Related papers (2023-05-24T07:09:25Z) - Disentangled Causal Graph Learning for Online Unsupervised Root Cause
Analysis [49.910053255238566]
Root cause analysis (RCA) can identify the root causes of system faults/failures by analyzing system monitoring data.
Previous research has mostly focused on developing offline RCA algorithms, which often require manually initiating the RCA process.
We propose CORAL, a novel online RCA framework that can automatically trigger the RCA process and incrementally update the RCA model.
arXiv Detail & Related papers (2023-05-18T01:27:48Z) - PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows.
Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z) - DAE : Discriminatory Auto-Encoder for multivariate time-series anomaly
detection in air transportation [68.8204255655161]
We propose a novel anomaly detection model called Discriminatory Auto-Encoder (DAE)
It uses the baseline of a regular LSTM-based auto-encoder but with several decoders, each getting data of a specific flight phase.
Results show that the DAE achieves better results in both accuracy and speed of detection.
arXiv Detail & Related papers (2021-09-08T14:07:55Z) - Graph Backdoor [53.70971502299977]
We present GTA, the first backdoor attack on graph neural networks (GNNs)
GTA departs in significant ways: it defines triggers as specific subgraphs, including both topological structures and descriptive features.
It can be instantiated for both transductive (e.g., node classification) and inductive (e.g., graph classification) tasks.
arXiv Detail & Related papers (2020-06-21T19:45:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.