IGB: Addressing The Gaps In Labeling, Features, Heterogeneity, and Size
of Public Graph Datasets for Deep Learning Research
- URL: http://arxiv.org/abs/2302.13522v2
- Date: Wed, 21 Jun 2023 23:30:52 GMT
- Title: IGB: Addressing The Gaps In Labeling, Features, Heterogeneity, and Size
of Public Graph Datasets for Deep Learning Research
- Authors: Arpandeep Khatua and Vikram Sharma Mailthody and Bhagyashree Taleka
and Tengfei Ma and Xiang Song and Wen-mei Hwu
- Abstract summary: Graph neural networks (GNNs) have shown high potential for a variety of real-world, challenging applications.
One of the major obstacles in GNN research is the lack of large-scale flexible datasets.
We introduce the Illinois Graph Benchmark (IGB), a research dataset tool that the developers can use to train, scrutinize and evaluate GNN models.
- Score: 14.191338008898963
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Graph neural networks (GNNs) have shown high potential for a variety of
real-world, challenging applications, but one of the major obstacles in GNN
research is the lack of large-scale flexible datasets. Most existing public
datasets for GNNs are relatively small, which limits the ability of GNNs to
generalize to unseen data. The few existing large-scale graph datasets provide
very limited labeled data. This makes it difficult to determine if the GNN
model's low accuracy for unseen data is inherently due to insufficient training
data or if the model failed to generalize. Additionally, datasets used to train
GNNs need to offer flexibility to enable a thorough study of the impact of
various factors while training GNN models.
In this work, we introduce the Illinois Graph Benchmark (IGB), a research
dataset tool that the developers can use to train, scrutinize and
systematically evaluate GNN models with high fidelity. IGB includes both
homogeneous and heterogeneous academic graphs of enormous sizes, with more than
40% of their nodes labeled. Compared to the largest graph datasets publicly
available, the IGB provides over 162X more labeled data for deep learning
practitioners and developers to create and evaluate models with higher
accuracy. The IGB dataset is a collection of academic graphs designed to be
flexible, enabling the study of various GNN architectures, embedding generation
techniques, and analyzing system performance issues for node classification
tasks. IGB is open-sourced, supports DGL and PyG frameworks, and comes with
releases of the raw text that we believe foster emerging language models and
GNN research projects. An early public version of IGB is available at
https://github.com/IllinoisGraphBenchmark/IGB-Datasets.
Related papers
- DA-MoE: Addressing Depth-Sensitivity in Graph-Level Analysis through Mixture of Experts [70.21017141742763]
Graph neural networks (GNNs) are gaining popularity for processing graph-structured data.
Existing methods generally use a fixed number of GNN layers to generate representations for all graphs.
We propose the depth adaptive mixture of expert (DA-MoE) method, which incorporates two main improvements to GNN.
arXiv Detail & Related papers (2024-11-05T11:46:27Z) - Diffusing to the Top: Boost Graph Neural Networks with Minimal Hyperparameter Tuning [33.948899558876604]
This work introduces a graph-conditioned latent diffusion framework (GNN-Diff) to generate high-performing GNNs.
We validate our method through 166 experiments across four graph tasks: node classification on small, large, and long-range graphs, as well as link prediction.
arXiv Detail & Related papers (2024-10-08T05:27:34Z) - Spectral Greedy Coresets for Graph Neural Networks [61.24300262316091]
The ubiquity of large-scale graphs in node-classification tasks hinders the real-world applications of Graph Neural Networks (GNNs)
This paper studies graph coresets for GNNs and avoids the interdependence issue by selecting ego-graphs based on their spectral embeddings.
Our spectral greedy graph coreset (SGGC) scales to graphs with millions of nodes, obviates the need for model pre-training, and applies to low-homophily graphs.
arXiv Detail & Related papers (2024-05-27T17:52:12Z) - A Comprehensive Study on Large-Scale Graph Training: Benchmarking and
Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs)
We present a new ensembling training manner, named EnGCN, to address the existing issues.
Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z) - Geodesic Graph Neural Network for Efficient Graph Representation
Learning [34.047527874184134]
We propose an efficient GNN framework called Geodesic GNN (GDGNN)
It injects conditional relationships between nodes into the model without labeling.
Conditioned on the geodesic representations, GDGNN is able to generate node, link, and graph representations that carry much richer structural information than plain GNNs.
arXiv Detail & Related papers (2022-10-06T02:02:35Z) - Graph Generative Model for Benchmarking Graph Neural Networks [73.11514658000547]
We introduce a novel graph generative model that learns and reproduces the distribution of real-world graphs in a privacy-controlled way.
Our model can successfully generate privacy-controlled, synthetic substitutes of large-scale real-world graphs that can be effectively used to benchmark GNN models.
arXiv Detail & Related papers (2022-07-10T06:42:02Z) - Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive
Benchmark Study [100.27567794045045]
Training deep graph neural networks (GNNs) is notoriously hard.
We present the first fair and reproducible benchmark dedicated to assessing the "tricks" of training deep GNNs.
arXiv Detail & Related papers (2021-08-24T05:00:37Z) - A Unified Lottery Ticket Hypothesis for Graph Neural Networks [82.31087406264437]
We present a unified GNN sparsification (UGS) framework that simultaneously prunes the graph adjacency matrix and the model weights.
We further generalize the popular lottery ticket hypothesis to GNNs for the first time, by defining a graph lottery ticket (GLT) as a pair of core sub-dataset and sparse sub-network.
arXiv Detail & Related papers (2021-02-12T21:52:43Z) - Graph Random Neural Network for Semi-Supervised Learning on Graphs [36.218650686748546]
We study the problem of semi-supervised learning on graphs, for which graph neural networks (GNNs) have been extensively explored.
Most existing GNNs inherently suffer from the limitations of over-smoothing, non-robustness, and weak-generalization when labeled nodes are scarce.
In this paper, we propose a simple yet effective framework -- GRAPH R NEURAL NETWORKS (GRAND) -- to address these issues.
arXiv Detail & Related papers (2020-05-22T09:40:13Z) - Self-Enhanced GNN: Improving Graph Neural Networks Using Model Outputs [20.197085398581397]
Graph neural networks (GNNs) have received much attention recently because of their excellent performance on graph-based tasks.
We propose self-enhanced GNN (SEG), which improves the quality of the input data using the outputs of existing GNN models.
SEG consistently improves the performance of well-known GNN models such as GCN, GAT and SGC across different datasets.
arXiv Detail & Related papers (2020-02-18T12:27:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.