Open Graph Benchmark: Datasets for Machine Learning on Graphs
- URL: http://arxiv.org/abs/2005.00687v7
- Date: Thu, 25 Feb 2021 02:06:27 GMT
- Title: Open Graph Benchmark: Datasets for Machine Learning on Graphs
- Authors: Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren,
Bowen Liu, Michele Catasta, Jure Leskovec
- Abstract summary: We present the Open Graph Benchmark (OGB) to facilitate scalable, robust, and reproducible graph machine learning (ML) research.
OGB datasets are large-scale, encompass multiple important graph ML tasks, and cover a diverse range of domains.
For each dataset, we provide a unified evaluation protocol using meaningful application-specific data splits and evaluation metrics.
- Score: 86.96887552203479
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present the Open Graph Benchmark (OGB), a diverse set of challenging and
realistic benchmark datasets to facilitate scalable, robust, and reproducible
graph machine learning (ML) research. OGB datasets are large-scale, encompass
multiple important graph ML tasks, and cover a diverse range of domains,
ranging from social and information networks to biological networks, molecular
graphs, source code ASTs, and knowledge graphs. For each dataset, we provide a
unified evaluation protocol using meaningful application-specific data splits
and evaluation metrics. In addition to building the datasets, we also perform
extensive benchmark experiments for each dataset. Our experiments suggest that
OGB datasets present significant challenges of scalability to large-scale
graphs and out-of-distribution generalization under realistic data splits,
indicating fruitful opportunities for future research. Finally, OGB provides an
automated end-to-end graph ML pipeline that simplifies and standardizes the
process of graph data loading, experimental setup, and model evaluation. OGB
will be regularly updated and welcomes inputs from the community. OGB datasets
as well as data loaders, evaluation scripts, baseline code, and leaderboards
are publicly available at https://ogb.stanford.edu .
Related papers
- DA-MoE: Addressing Depth-Sensitivity in Graph-Level Analysis through Mixture of Experts [70.21017141742763]
Graph neural networks (GNNs) are gaining popularity for processing graph-structured data.
Existing methods generally use a fixed number of GNN layers to generate representations for all graphs.
We propose the depth adaptive mixture of expert (DA-MoE) method, which incorporates two main improvements to GNN.
arXiv Detail & Related papers (2024-11-05T11:46:27Z) - Challenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis [50.972595036856035]
We present a code that successfully replicates results from six popular and recent graph recommendation models.
We compare these graph models with traditional collaborative filtering models that historically performed well in offline evaluations.
By investigating the information flow from users' neighborhoods, we aim to identify which models are influenced by intrinsic features in the dataset structure.
arXiv Detail & Related papers (2023-08-01T09:31:44Z) - Temporal Graph Benchmark for Machine Learning on Temporal Graphs [54.52243310226456]
Temporal Graph Benchmark (TGB) is a collection of challenging and diverse benchmark datasets.
We benchmark each dataset and find that the performance of common models can vary drastically across datasets.
TGB provides an automated machine learning pipeline for reproducible and accessible temporal graph research.
arXiv Detail & Related papers (2023-07-03T13:58:20Z) - Graphtester: Exploring Theoretical Boundaries of GNNs on Graph Datasets [10.590698823137755]
We provide a new tool called Graphtester for a comprehensive analysis of the theoretical capabilities of GNNs for various datasets, tasks, and scores.
We use Graphtester to analyze over 40 different graph datasets, determining upper bounds on the performance of various GNNs based on the number of layers.
We show that the tool can also be used for Graph Transformers using positional node encodings, thereby expanding its scope.
arXiv Detail & Related papers (2023-06-30T08:53:23Z) - IGB: Addressing The Gaps In Labeling, Features, Heterogeneity, and Size
of Public Graph Datasets for Deep Learning Research [14.191338008898963]
Graph neural networks (GNNs) have shown high potential for a variety of real-world, challenging applications.
One of the major obstacles in GNN research is the lack of large-scale flexible datasets.
We introduce the Illinois Graph Benchmark (IGB), a research dataset tool that the developers can use to train, scrutinize and evaluate GNN models.
arXiv Detail & Related papers (2023-02-27T05:21:35Z) - GraphWorld: Fake Graphs Bring Real Insights for GNNs [4.856486822139849]
GraphWorld allows a user to efficiently generate a world with millions of statistically diverse datasets.
We present insights from GraphWorld experiments regarding the performance characteristics of tens of thousands of GNN models over millions of benchmark datasets.
arXiv Detail & Related papers (2022-02-28T22:00:02Z) - Graph Contrastive Learning Automated [94.41860307845812]
Graph contrastive learning (GraphCL) has emerged with promising representation learning performance.
The effectiveness of GraphCL hinges on ad-hoc data augmentations, which have to be manually picked per dataset.
This paper proposes a unified bi-level optimization framework to automatically, adaptively and dynamically select data augmentations when performing GraphCL on specific graph data.
arXiv Detail & Related papers (2021-06-10T16:35:27Z) - OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs [69.23600404232883]
OGB Large-Scale Challenge (OGB-LSC) is a collection of three real-world datasets for advancing the state-of-the-art in large-scale graph ML.
OGB-LSC provides dedicated baseline experiments, scaling up expressive graph ML models to the massive datasets.
arXiv Detail & Related papers (2021-03-17T04:08:03Z) - TUDataset: A collection of benchmark datasets for learning with graphs [21.16723995518478]
We introduce the TUDataset for graph classification and regression.
The collection consists of over 120 datasets of varying sizes from a wide range of applications.
All datasets are available at www.graphlearning.io.
arXiv Detail & Related papers (2020-07-16T21:46:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.