GraphFM: A Comprehensive Benchmark for Graph Foundation Model
- URL: http://arxiv.org/abs/2406.08310v2
- Date: Fri, 14 Jun 2024 15:36:00 GMT
- Title: GraphFM: A Comprehensive Benchmark for Graph Foundation Model
- Authors: Yuhao Xu, Xinqi Liu, Keyu Duan, Yi Fang, Yu-Neng Chuang, Daochen Zha, Qiaoyu Tan,
- Abstract summary: Foundation Models (FMs) serve as a general class for the development of artificial intelligence systems.
Despite extensive research into self-supervised learning as the cornerstone of FMs, several outstanding issues persist.
The extent of generalization capability on downstream tasks remains unclear.
It is unknown how effectively these models can scale to large datasets.
- Score: 33.157367455390144
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Foundation Models (FMs) serve as a general class for the development of artificial intelligence systems, offering broad potential for generalization across a spectrum of downstream tasks. Despite extensive research into self-supervised learning as the cornerstone of FMs, several outstanding issues persist in Graph Foundation Models that rely on graph self-supervised learning, namely: 1) Homogenization. The extent of generalization capability on downstream tasks remains unclear. 2) Scalability. It is unknown how effectively these models can scale to large datasets. 3) Efficiency. The training time and memory usage of these models require evaluation. 4) Training Stop Criteria. Determining the optimal stopping strategy for pre-training across multiple tasks to maximize performance on downstream tasks. To address these questions, we have constructed a rigorous benchmark that thoroughly analyzes and studies the generalization and scalability of self-supervised Graph Neural Network (GNN) models. Regarding generalization, we have implemented and compared the performance of various self-supervised GNN models, trained to generate node representations, across tasks such as node classification, link prediction, and node clustering. For scalability, we have compared the performance of various models after training using full-batch and mini-batch strategies. Additionally, we have assessed the training efficiency of these models by conducting experiments to test their GPU memory usage and throughput. Through these experiments, we aim to provide insights to motivate future research. The code for this benchmark is publicly available at https://github.com/NYUSHCS/GraphFM.
Related papers
- Diffusing to the Top: Boost Graph Neural Networks with Minimal Hyperparameter Tuning [33.948899558876604]
This work introduces a graph-conditioned latent diffusion framework (GNN-Diff) to generate high-performing GNNs.
We validate our method through 166 experiments across four graph tasks: node classification on small, large, and long-range graphs, as well as link prediction.
arXiv Detail & Related papers (2024-10-08T05:27:34Z) - GOFA: A Generative One-For-All Model for Joint Graph Language Modeling [38.267339613261996]
We propose a novel generative graph language model GOFA to solve the problem.
GOFA is pre-trained on newly proposed graph-level next-word prediction, question-answering, and structural tasks.
The model is evaluated on various downstream tasks, demonstrating a strong ability to solve structural and contextual problems in zero-shot scenarios.
arXiv Detail & Related papers (2024-07-12T22:23:51Z) - GOODAT: Towards Test-time Graph Out-of-Distribution Detection [103.40396427724667]
Graph neural networks (GNNs) have found widespread application in modeling graph data across diverse domains.
Recent studies have explored graph OOD detection, often focusing on training a specific model or modifying the data on top of a well-trained GNN.
This paper introduces a data-centric, unsupervised, and plug-and-play solution that operates independently of training data and modifications of GNN architecture.
arXiv Detail & Related papers (2024-01-10T08:37:39Z) - SimTeG: A Frustratingly Simple Approach Improves Textual Graph Learning [131.04781590452308]
We present SimTeG, a frustratingly Simple approach for Textual Graph learning.
We first perform supervised parameter-efficient fine-tuning (PEFT) on a pre-trained LM on the downstream task.
We then generate node embeddings using the last hidden states of finetuned LM.
arXiv Detail & Related papers (2023-08-03T07:00:04Z) - A Comprehensive Study on Large-Scale Graph Training: Benchmarking and
Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs)
We present a new ensembling training manner, named EnGCN, to address the existing issues.
Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z) - SizeShiftReg: a Regularization Method for Improving Size-Generalization
in Graph Neural Networks [5.008597638379227]
Graph neural networks (GNNs) have become the de facto model of choice for graph classification.
We propose a regularization strategy that can be applied to any GNN to improve its generalization capabilities without requiring access to the test data.
Our regularization is based on the idea of simulating a shift in the size of the training graphs using coarsening techniques.
arXiv Detail & Related papers (2022-07-16T09:50:45Z) - Neural Graph Matching for Pre-training Graph Neural Networks [72.32801428070749]
Graph neural networks (GNNs) have been shown powerful capacity at modeling structural data.
We present a novel Graph Matching based GNN Pre-Training framework, called GMPT.
The proposed method can be applied to fully self-supervised pre-training and coarse-grained supervised pre-training.
arXiv Detail & Related papers (2022-03-03T09:53:53Z) - Node Feature Extraction by Self-Supervised Multi-scale Neighborhood
Prediction [123.20238648121445]
We propose a new self-supervised learning framework, Graph Information Aided Node feature exTraction (GIANT)
GIANT makes use of the eXtreme Multi-label Classification (XMC) formalism, which is crucial for fine-tuning the language model based on graph information.
We demonstrate the superior performance of GIANT over the standard GNN pipeline on Open Graph Benchmark datasets.
arXiv Detail & Related papers (2021-10-29T19:55:12Z) - Robust Optimization as Data Augmentation for Large-scale Graphs [117.2376815614148]
We propose FLAG (Free Large-scale Adversarial Augmentation on Graphs), which iteratively augments node features with gradient-based adversarial perturbations during training.
FLAG is a general-purpose approach for graph data, which universally works in node classification, link prediction, and graph classification tasks.
arXiv Detail & Related papers (2020-10-19T21:51:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.