PROVSYN: Synthesizing Provenance Graphs for Data Augmentation in Intrusion Detection Systems
- URL: http://arxiv.org/abs/2506.06226v1
- Date: Fri, 06 Jun 2025 16:41:17 GMT
- Title: PROVSYN: Synthesizing Provenance Graphs for Data Augmentation in Intrusion Detection Systems
- Authors: Yi Huang, Wajih UI Hassan, Yao Guo, Xiangqun Chen, Ding Li,
- Abstract summary: Provenance graph analysis plays a vital role in intrusion detection, particularly against Advanced Persistent Threats (APTs)<n>We introduce PROVSYN, an automated framework that synthesizes provenance graphs through a three-phase pipeline.<n>We show that PROVSYN produces high-fidelity graphs and improves detection performance through effective data augmentation.
- Score: 10.160654114774513
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Provenance graph analysis plays a vital role in intrusion detection, particularly against Advanced Persistent Threats (APTs), by exposing complex attack patterns. While recent systems combine graph neural networks (GNNs) with natural language processing (NLP) to capture structural and semantic features, their effectiveness is limited by class imbalance in real-world data. To address this, we introduce PROVSYN, an automated framework that synthesizes provenance graphs through a three-phase pipeline: (1) heterogeneous graph structure synthesis with structural-semantic modeling, (2) rule-based topological refinement, and (3) context-aware textual attribute synthesis using large language models (LLMs). PROVSYN includes a comprehensive evaluation framework that integrates structural, textual, temporal, and embedding-based metrics, along with a semantic validation mechanism to assess the correctness of generated attack patterns and system behaviors. To demonstrate practical utility, we use the synthetic graphs to augment training datasets for downstream APT detection models. Experimental results show that PROVSYN produces high-fidelity graphs and improves detection performance through effective data augmentation.
Related papers
- AnomalyGen: An Automated Semantic Log Sequence Generation Framework with LLM for Anomaly Detection [25.83270938475311]
AnomalyGen is the first automated log synthesis framework specifically designed for anomaly detection.<n>Our framework integrates enhanced program analysis with Chain-of-Thought reasoning (CoT reasoning) to enable iterative log generation and anomaly annotation.<n>When augmenting benchmark datasets with synthesized logs, we observe maximum F1-score improvements of 3.7%.
arXiv Detail & Related papers (2025-04-16T16:54:38Z) - Multi-Modality Representation Learning for Antibody-Antigen Interactions Prediction [6.681379194115459]
We present MuLAAIP, an AAI prediction framework that utilizes graph attention networks to illuminate graph-level structural features and normalized adaptive graph convolution networks to capture inter-antibody sequence associations.<n>Our results demonstrate that MuLAAIP outperforms current state-of-the-art methods in terms of predictive performance.
arXiv Detail & Related papers (2025-03-22T06:23:51Z) - Persistent Homology-induced Graph Ensembles for Time Series Regressions [1.5728609542259502]
We create an ensemble of Graph Neural Networks based on Persistent Homology filtration.<n>The ensemble aggregates the signals from the individual learners via an attention-based routing mechanism.<n>Four different real-world experiments on seismic activity prediction and traffic forecasting demonstrate that our approach consistently outperforms single-graph baselines.
arXiv Detail & Related papers (2025-03-18T13:22:52Z) - GraphSeqLM: A Unified Graph Language Framework for Omic Graph Learning [20.906136206438102]
Graph Neural Networks (GNNs) offer a robust framework for analyzing large-scale signaling pathways and protein-protein interaction networks.<n>We propose Graph Sequence Language Model (GraphSeqLM), a framework that enhances GNNs with biological sequence embeddings.
arXiv Detail & Related papers (2024-12-20T11:05:26Z) - Learning to Model Graph Structural Information on MLPs via Graph Structure Self-Contrasting [50.181824673039436]
We propose a Graph Structure Self-Contrasting (GSSC) framework that learns graph structural information without message passing.
The proposed framework is based purely on Multi-Layer Perceptrons (MLPs), where the structural information is only implicitly incorporated as prior knowledge.
It first applies structural sparsification to remove potentially uninformative or noisy edges in the neighborhood, and then performs structural self-contrasting in the sparsified neighborhood to learn robust node representations.
arXiv Detail & Related papers (2024-09-09T12:56:02Z) - S$^2$GSL: Incorporating Segment to Syntactic Enhanced Graph Structure Learning for Aspect-based Sentiment Analysis [19.740223755240734]
We propose S$2$GSL, incorporating Segment to Syntactic enhanced Graph Structure Learning for ABSA.
S$2$GSL is featured with a segment-aware semantic graph learning and a syntax-based latent graph learning.
arXiv Detail & Related papers (2024-06-05T03:44:35Z) - EasyDGL: Encode, Train and Interpret for Continuous-time Dynamic Graph Learning [92.71579608528907]
This paper aims to design an easy-to-use pipeline (termed as EasyDGL) composed of three key modules with both strong ability fitting and interpretability.
EasyDGL can effectively quantify the predictive power of frequency content that a model learn from the evolving graph data.
arXiv Detail & Related papers (2023-03-22T06:35:08Z) - Energy-based Out-of-Distribution Detection for Graph Neural Networks [76.0242218180483]
We propose a simple, powerful and efficient OOD detection model for GNN-based learning on graphs, which we call GNNSafe.
GNNSafe achieves up to $17.0%$ AUROC improvement over state-of-the-arts and it could serve as simple yet strong baselines in such an under-developed area.
arXiv Detail & Related papers (2023-02-06T16:38:43Z) - Heterogeneous Graph Neural Networks using Self-supervised Reciprocally
Contrastive Learning [102.9138736545956]
Heterogeneous graph neural network (HGNN) is a very popular technique for the modeling and analysis of heterogeneous graphs.
We develop for the first time a novel and robust heterogeneous graph contrastive learning approach, namely HGCL, which introduces two views on respective guidance of node attributes and graph topologies.
In this new approach, we adopt distinct but most suitable attribute and topology fusion mechanisms in the two views, which are conducive to mining relevant information in attributes and topologies separately.
arXiv Detail & Related papers (2022-04-30T12:57:02Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z) - Structural Landmarking and Interaction Modelling: on Resolution Dilemmas
in Graph Classification [50.83222170524406]
We study the intrinsic difficulty in graph classification under the unified concept of resolution dilemmas''
We propose SLIM'', an inductive neural network model for Structural Landmarking and Interaction Modelling.
arXiv Detail & Related papers (2020-06-29T01:01:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.