Related papers: A Framework for Large Scale Synthetic Graph Dataset Generation

A Framework for Large Scale Synthetic Graph Dataset Generation

URL: http://arxiv.org/abs/2210.01944v4
Date: Thu, 5 Oct 2023 05:22:43 GMT
Title: A Framework for Large Scale Synthetic Graph Dataset Generation
Authors: Sajad Darabi, Piotr Bigaj, Dawid Majchrowski, Artur Kasymov, Pawel Morkisz, Alex Fit-Florea
Abstract summary: This work proposes a scalable synthetic graph generation tool to scale the datasets to production-size graphs. The tool learns a series of parametric models from proprietary datasets that can be released to researchers. We demonstrate the generalizability of the framework across a series of datasets.
Score: 2.248608623448951
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently there has been increasing interest in developing and deploying deep graph learning algorithms for many tasks, such as fraud detection and recommender systems. Albeit, there is a limited number of publicly available graph-structured datasets, most of which are tiny compared to production-sized applications or are limited in their application domain. This work tackles this shortcoming by proposing a scalable synthetic graph generation tool to scale the datasets to production-size graphs with trillions of edges and billions of nodes. The tool learns a series of parametric models from proprietary datasets that can be released to researchers to study various graph methods on the synthetic data increasing prototype development and novel applications. We demonstrate the generalizability of the framework across a series of datasets, mimicking structural and feature distributions as well as the ability to scale them across varying sizes demonstrating their usefulness for benchmarking and model development. Code can be found on https://github.com/NVIDIA/DeepLearningExamples/tree/master/Tools/DGLPyTorch/SyntheticGraphGeneration .

Related papers

Synthesizing Diverse Network Flow Datasets with Scalable Dynamic Multigraph Generation [0.0]
We introduce a novel machine learning model for generating high-fidelity synthetic network flow datasets.<n>Our results demonstrate improvements in accuracy over previous large-scale graph generation methods.
arXiv Detail & Related papers (2025-05-12T17:26:48Z)
Revisiting Graph Neural Networks on Graph-level Tasks: Comprehensive Experiments, Analysis, and Improvements [54.006506479865344]
We propose a unified evaluation framework for graph-level Graph Neural Networks (GNNs) This framework provides a standardized setting to evaluate GNNs across diverse datasets. We also propose a novel GNN model with enhanced expressivity and generalization capabilities.
arXiv Detail & Related papers (2025-01-01T08:48:53Z)
GraphStorm: all-in-one graph machine learning framework for industry applications [75.23076561638348]
GraphStorm is an end-to-end solution for scalable graph construction, graph model training and inference. Every component in GraphStorm can operate on graphs with billions of nodes and can scale model training and inference to different hardware without changing any code. GraphStorm has been used and deployed for over a dozen billion-scale industry applications after its release in May 2023.
arXiv Detail & Related papers (2024-06-10T04:56:16Z)
Graph data augmentation with Gromow-Wasserstein Barycenters [0.0]
It has been proposed a novel augmentation strategy for graphs that operates in a non-Euclidean space. A non-Euclidean distance, specifically the Gromow-Wasserstein distance, results in better approximations of the graphon. This framework also provides a means to validate different graphon estimation approaches.
arXiv Detail & Related papers (2024-04-12T10:22:55Z)
GraphMaker: Can Diffusion Models Generate Large Attributed Graphs? [7.330479039715941]
Large-scale graphs with node attributes are increasingly common in various real-world applications. Traditional graph generation methods are limited in their capacity to handle these complex structures. This paper introduces a novel diffusion model, GraphMaker, specifically designed for generating large attributed graphs.
arXiv Detail & Related papers (2023-10-20T22:12:46Z)
Sparsity exploitation via discovering graphical models in multi-variate time-series forecasting [1.2762298148425795]
We propose a decoupled training method, which includes a graph generating module and a GNNs forecasting module. First, we use Graphical Lasso (or GraphLASSO) to directly exploit the sparsity pattern from data to build graph structures. Second, we fit these graph structures and the input data into a Graph Convolutional Recurrent Network (GCRN) to train a forecasting model.
arXiv Detail & Related papers (2023-06-29T16:48:00Z)
GSHOT: Few-shot Generative Modeling of Labeled Graphs [44.94210194611249]
We introduce the hitherto unexplored paradigm of few-shot graph generative modeling. We develop GSHOT, a framework for few-shot labeled graph generative modeling. GSHOT adapts to an unseen graph dataset through self-paced fine-tuning.
arXiv Detail & Related papers (2023-06-06T08:03:18Z)
Bures-Wasserstein Means of Graphs [60.42414991820453]
We propose a novel framework for defining a graph mean via embeddings in the space of smooth graph signal distributions. By finding a mean in this embedding space, we can recover a mean graph that preserves structural information. We establish the existence and uniqueness of the novel graph mean, and provide an iterative algorithm for computing it.
arXiv Detail & Related papers (2023-05-31T11:04:53Z)
Graph Generative Model for Benchmarking Graph Neural Networks [73.11514658000547]
We introduce a novel graph generative model that learns and reproduces the distribution of real-world graphs in a privacy-controlled way. Our model can successfully generate privacy-controlled, synthetic substitutes of large-scale real-world graphs that can be effectively used to benchmark GNN models.
arXiv Detail & Related papers (2022-07-10T06:42:02Z)
A Robust Stacking Framework for Training Deep Graph Models with Multifaceted Node Features [61.92791503017341]
Graph Neural Networks (GNNs) with numerical node features and graph structure as inputs have demonstrated superior performance on various supervised learning tasks with graph data. The best models for such data types in most standard supervised learning settings with IID (non-graph) data are not easily incorporated into a GNN. Here we propose a robust stacking framework that fuses graph-aware propagation with arbitrary models intended for IID data.
arXiv Detail & Related papers (2022-06-16T22:46:33Z)
Condensing Graphs via One-Step Gradient Matching [50.07587238142548]
We propose a one-step gradient matching scheme, which performs gradient matching for only one single step without training the network weights. Our theoretical analysis shows this strategy can generate synthetic graphs that lead to lower classification loss on real graphs. In particular, we are able to reduce the dataset size by 90% while approximating up to 98% of the original performance.
arXiv Detail & Related papers (2022-06-15T18:20:01Z)
Synthetic Graph Generation to Benchmark Graph Learning [7.914804101579097]
Graph learning algorithms have attained state-of-the-art performance on many graph analysis tasks. One reason is due to the very small number of datasets used in practice to benchmark the performance of graph learning algorithms. We propose to generate synthetic graphs, and study the behaviour of graph learning algorithms in a controlled scenario.
arXiv Detail & Related papers (2022-04-04T10:48:32Z)
Adaptive Graph Auto-Encoder for General Data Clustering [90.8576971748142]
Graph-based clustering plays an important role in the clustering area. Recent studies about graph convolution neural networks have achieved impressive success on graph type data. We propose a graph auto-encoder for general data clustering, which constructs the graph adaptively according to the generative perspective of graphs.
arXiv Detail & Related papers (2020-02-20T10:11:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.