PROVCREATOR: Synthesizing Complex Heterogenous Graphs with Node and Edge Attributes
- URL: http://arxiv.org/abs/2507.20967v1
- Date: Mon, 28 Jul 2025 16:22:50 GMT
- Title: PROVCREATOR: Synthesizing Complex Heterogenous Graphs with Node and Edge Attributes
- Authors: Tianhao Wang, Simon Klancher, Kunal Mukherjee, Josh Wiedemeier, Feng Chen, Murat Kantarcioglu, Kangkook Jee,
- Abstract summary: We introduce ProvCreator, a synthetic graph framework for complex heterogeneous graphs.<n>ProvCreator formulates graph synthesis as a sequence generation task.<n>It features a versatile graph-to-sequence encoder-decoder that supports end-to-end, learnable graph generation.
- Score: 14.078355036170155
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rise of graph-structured data has driven interest in graph learning and synthetic data generation. While successful in text and image domains, synthetic graph generation remains challenging -- especially for real-world graphs with complex, heterogeneous schemas. Existing research has focused mostly on homogeneous structures with simple attributes, limiting their usefulness and relevance for application domains requiring semantic fidelity. In this research, we introduce ProvCreator, a synthetic graph framework designed for complex heterogeneous graphs with high-dimensional node and edge attributes. ProvCreator formulates graph synthesis as a sequence generation task, enabling the use of transformer-based large language models. It features a versatile graph-to-sequence encoder-decoder that 1. losslessly encodes graph structure and attributes, 2. efficiently compresses large graphs for contextual modeling, and 3. supports end-to-end, learnable graph generation. To validate our research, we evaluate ProvCreator on two challenging domains: system provenance graphs in cybersecurity and knowledge graphs from IntelliGraph Benchmark Dataset. In both cases, ProvCreator captures intricate dependencies between structure and semantics, enabling the generation of realistic and privacy-aware synthetic datasets.
Related papers
- Synthesizing Diverse Network Flow Datasets with Scalable Dynamic Multigraph Generation [0.0]
We introduce a novel machine learning model for generating high-fidelity synthetic network flow datasets.<n>Our results demonstrate improvements in accuracy over previous large-scale graph generation methods.
arXiv Detail & Related papers (2025-05-12T17:26:48Z) - Data Augmentation in Graph Neural Networks: The Role of Generated Synthetic Graphs [0.24999074238880487]
This study explores using generated graphs for data augmentation.
It compares the performance of combining generated graphs with real graphs, and examining the effect of different quantities of generated graphs on graph classification tasks.
Our results introduce a new approach to graph data augmentation, ensuring consistent labels and enhancing classification performance.
arXiv Detail & Related papers (2024-07-20T06:05:26Z) - Hypergraph-enhanced Dual Semi-supervised Graph Classification [14.339207883093204]
We propose a Hypergraph-Enhanced DuAL framework named HEAL for semi-supervised graph classification.
To better explore the higher-order relationships among nodes, we design a hypergraph structure learning to adaptively learn complex node dependencies.
Based on the learned hypergraph, we introduce a line graph to capture the interaction between hyperedges.
arXiv Detail & Related papers (2024-05-08T02:44:13Z) - Enhancing Node Representations for Real-World Complex Networks with Topological Augmentation [35.42514739566419]
TopoAug is a novel graph augmentation method that builds a complex from the original graph by constructing virtual hyperedges directly from raw data.
We provide 23 novel real-world graph datasets across various domains including social media, biology, and e-commerce.
Our empirical study shows that TopoAug consistently and significantly outperforms GNN baselines and other graph augmentation methods.
arXiv Detail & Related papers (2024-02-20T14:18:43Z) - GraphMaker: Can Diffusion Models Generate Large Attributed Graphs? [7.330479039715941]
Large-scale graphs with node attributes are increasingly common in various real-world applications.
Traditional graph generation methods are limited in their capacity to handle these complex structures.
This paper introduces a novel diffusion model, GraphMaker, specifically designed for generating large attributed graphs.
arXiv Detail & Related papers (2023-10-20T22:12:46Z) - Structure-free Graph Condensation: From Large-scale Graphs to Condensed
Graph-free Data [91.27527985415007]
Existing graph condensation methods rely on the joint optimization of nodes and structures in the condensed graph.
We advocate a new Structure-Free Graph Condensation paradigm, named SFGC, to distill a large-scale graph into a small-scale graph node set.
arXiv Detail & Related papers (2023-06-05T07:53:52Z) - GrannGAN: Graph annotation generative adversarial networks [72.66289932625742]
We consider the problem of modelling high-dimensional distributions and generating new examples of data with complex relational feature structure coherent with a graph skeleton.
The model we propose tackles the problem of generating the data features constrained by the specific graph structure of each data point by splitting the task into two phases.
In the first it models the distribution of features associated with the nodes of the given graph, in the second it complements the edge features conditionally on the node features.
arXiv Detail & Related papers (2022-12-01T11:49:07Z) - Heterogeneous Graph Neural Networks using Self-supervised Reciprocally
Contrastive Learning [102.9138736545956]
Heterogeneous graph neural network (HGNN) is a very popular technique for the modeling and analysis of heterogeneous graphs.
We develop for the first time a novel and robust heterogeneous graph contrastive learning approach, namely HGCL, which introduces two views on respective guidance of node attributes and graph topologies.
In this new approach, we adopt distinct but most suitable attribute and topology fusion mechanisms in the two views, which are conducive to mining relevant information in attributes and topologies separately.
arXiv Detail & Related papers (2022-04-30T12:57:02Z) - GoGNN: Graph of Graphs Neural Network for Predicting Structured Entity
Interactions [70.9481395807354]
We propose a Graph of Graphs Neural Network (GoGNN), which extracts the features in both structured entity graphs and the entity interaction graph in a hierarchical way.
GoGNN outperforms the state-of-the-art methods on two representative structured entity interaction prediction tasks.
arXiv Detail & Related papers (2020-05-12T03:46:15Z) - Structure-Augmented Text Representation Learning for Efficient Knowledge
Graph Completion [53.31911669146451]
Human-curated knowledge graphs provide critical supportive information to various natural language processing tasks.
These graphs are usually incomplete, urging auto-completion of them.
graph embedding approaches, e.g., TransE, learn structured knowledge via representing graph elements into dense embeddings.
textual encoding approaches, e.g., KG-BERT, resort to graph triple's text and triple-level contextualized representations.
arXiv Detail & Related papers (2020-04-30T13:50:34Z) - Adaptive Graph Auto-Encoder for General Data Clustering [90.8576971748142]
Graph-based clustering plays an important role in the clustering area.
Recent studies about graph convolution neural networks have achieved impressive success on graph type data.
We propose a graph auto-encoder for general data clustering, which constructs the graph adaptively according to the generative perspective of graphs.
arXiv Detail & Related papers (2020-02-20T10:11:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.