A Framework for Large Scale Synthetic Graph Dataset Generation
- URL: http://arxiv.org/abs/2210.01944v4
- Date: Thu, 5 Oct 2023 05:22:43 GMT
- Title: A Framework for Large Scale Synthetic Graph Dataset Generation
- Authors: Sajad Darabi, Piotr Bigaj, Dawid Majchrowski, Artur Kasymov, Pawel
Morkisz, Alex Fit-Florea
- Abstract summary: This work proposes a scalable synthetic graph generation tool to scale the datasets to production-size graphs.
The tool learns a series of parametric models from proprietary datasets that can be released to researchers.
We demonstrate the generalizability of the framework across a series of datasets.
- Score: 2.248608623448951
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently there has been increasing interest in developing and deploying deep
graph learning algorithms for many tasks, such as fraud detection and
recommender systems. Albeit, there is a limited number of publicly available
graph-structured datasets, most of which are tiny compared to production-sized
applications or are limited in their application domain. This work tackles this
shortcoming by proposing a scalable synthetic graph generation tool to scale
the datasets to production-size graphs with trillions of edges and billions of
nodes. The tool learns a series of parametric models from proprietary datasets
that can be released to researchers to study various graph methods on the
synthetic data increasing prototype development and novel applications. We
demonstrate the generalizability of the framework across a series of datasets,
mimicking structural and feature distributions as well as the ability to scale
them across varying sizes demonstrating their usefulness for benchmarking and
model development. Code can be found on
https://github.com/NVIDIA/DeepLearningExamples/tree/master/Tools/DGLPyTorch/SyntheticGraphGeneration .
Related papers
- GraphStorm: all-in-one graph machine learning framework for industry applications [75.23076561638348]
GraphStorm is an end-to-end solution for scalable graph construction, graph model training and inference.
Every component in GraphStorm can operate on graphs with billions of nodes and can scale model training and inference to different hardware without changing any code.
GraphStorm has been used and deployed for over a dozen billion-scale industry applications after its release in May 2023.
arXiv Detail & Related papers (2024-06-10T04:56:16Z) - Graph data augmentation with Gromow-Wasserstein Barycenters [0.0]
It has been proposed a novel augmentation strategy for graphs that operates in a non-Euclidean space.
A non-Euclidean distance, specifically the Gromow-Wasserstein distance, results in better approximations of the graphon.
This framework also provides a means to validate different graphon estimation approaches.
arXiv Detail & Related papers (2024-04-12T10:22:55Z) - GraphMaker: Can Diffusion Models Generate Large Attributed Graphs? [7.330479039715941]
Large-scale graphs with node attributes are increasingly common in various real-world applications.
Traditional graph generation methods are limited in their capacity to handle these complex structures.
This paper introduces a novel diffusion model, GraphMaker, specifically designed for generating large attributed graphs.
arXiv Detail & Related papers (2023-10-20T22:12:46Z) - Sparsity exploitation via discovering graphical models in multi-variate
time-series forecasting [1.2762298148425795]
We propose a decoupled training method, which includes a graph generating module and a GNNs forecasting module.
First, we use Graphical Lasso (or GraphLASSO) to directly exploit the sparsity pattern from data to build graph structures.
Second, we fit these graph structures and the input data into a Graph Convolutional Recurrent Network (GCRN) to train a forecasting model.
arXiv Detail & Related papers (2023-06-29T16:48:00Z) - GSHOT: Few-shot Generative Modeling of Labeled Graphs [44.94210194611249]
We introduce the hitherto unexplored paradigm of few-shot graph generative modeling.
We develop GSHOT, a framework for few-shot labeled graph generative modeling.
GSHOT adapts to an unseen graph dataset through self-paced fine-tuning.
arXiv Detail & Related papers (2023-06-06T08:03:18Z) - Bures-Wasserstein Means of Graphs [60.42414991820453]
We propose a novel framework for defining a graph mean via embeddings in the space of smooth graph signal distributions.
By finding a mean in this embedding space, we can recover a mean graph that preserves structural information.
We establish the existence and uniqueness of the novel graph mean, and provide an iterative algorithm for computing it.
arXiv Detail & Related papers (2023-05-31T11:04:53Z) - Graph Generative Model for Benchmarking Graph Neural Networks [73.11514658000547]
We introduce a novel graph generative model that learns and reproduces the distribution of real-world graphs in a privacy-controlled way.
Our model can successfully generate privacy-controlled, synthetic substitutes of large-scale real-world graphs that can be effectively used to benchmark GNN models.
arXiv Detail & Related papers (2022-07-10T06:42:02Z) - A Robust Stacking Framework for Training Deep Graph Models with
Multifaceted Node Features [61.92791503017341]
Graph Neural Networks (GNNs) with numerical node features and graph structure as inputs have demonstrated superior performance on various supervised learning tasks with graph data.
The best models for such data types in most standard supervised learning settings with IID (non-graph) data are not easily incorporated into a GNN.
Here we propose a robust stacking framework that fuses graph-aware propagation with arbitrary models intended for IID data.
arXiv Detail & Related papers (2022-06-16T22:46:33Z) - Condensing Graphs via One-Step Gradient Matching [50.07587238142548]
We propose a one-step gradient matching scheme, which performs gradient matching for only one single step without training the network weights.
Our theoretical analysis shows this strategy can generate synthetic graphs that lead to lower classification loss on real graphs.
In particular, we are able to reduce the dataset size by 90% while approximating up to 98% of the original performance.
arXiv Detail & Related papers (2022-06-15T18:20:01Z) - Synthetic Graph Generation to Benchmark Graph Learning [7.914804101579097]
Graph learning algorithms have attained state-of-the-art performance on many graph analysis tasks.
One reason is due to the very small number of datasets used in practice to benchmark the performance of graph learning algorithms.
We propose to generate synthetic graphs, and study the behaviour of graph learning algorithms in a controlled scenario.
arXiv Detail & Related papers (2022-04-04T10:48:32Z) - Adaptive Graph Auto-Encoder for General Data Clustering [90.8576971748142]
Graph-based clustering plays an important role in the clustering area.
Recent studies about graph convolution neural networks have achieved impressive success on graph type data.
We propose a graph auto-encoder for general data clustering, which constructs the graph adaptively according to the generative perspective of graphs.
arXiv Detail & Related papers (2020-02-20T10:11:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.