GSTBench: A Benchmark Study on the Transferability of Graph Self-Supervised Learning
- URL: http://arxiv.org/abs/2509.06975v1
- Date: Thu, 28 Aug 2025 19:13:10 GMT
- Title: GSTBench: A Benchmark Study on the Transferability of Graph Self-Supervised Learning
- Authors: Yu Song, Zhigang Hua, Yan Xie, Jingzhe Liu, Bo Long, Hui Liu,
- Abstract summary: Self-supervised learning (SSL) has shown great promise in graph representation learning.<n>Most existing graph SSL methods are developed and evaluated under a single-dataset setting.<n>We present GSTBench, the first systematic benchmark for evaluating the transferability of graph SSL methods.
- Score: 20.32550890936548
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised learning (SSL) has shown great promise in graph representation learning. However, most existing graph SSL methods are developed and evaluated under a single-dataset setting, leaving their cross-dataset transferability largely unexplored and limiting their ability to leverage knowledge transfer and large-scale pretraining, factors that are critical for developing generalized intelligence beyond fitting training data. To address this gap and advance foundation model research for graphs, we present GSTBench, the first systematic benchmark for evaluating the transferability of graph SSL methods. We conduct large-scale pretraining on ogbn-papers100M and evaluate five representative SSL methods across a diverse set of target graphs. Our standardized experimental setup decouples confounding factors such as model architecture, dataset characteristics, and adaptation protocols, enabling rigorous comparisons focused solely on pretraining objectives. Surprisingly, we observe that most graph SSL methods struggle to generalize, with some performing worse than random initialization. In contrast, GraphMAE, a masked autoencoder approach, consistently improves transfer performance. We analyze the underlying factors that drive these differences and offer insights to guide future research on transferable graph SSL, laying a solid foundation for the "pretrain-then-transfer" paradigm in graph learning. Our code is available at https://github.com/SongYYYY/GSTBench.
Related papers
- PyG-SSL: A Graph Self-Supervised Learning Toolkit [71.22547762704602]
Graph Self-Supervised Learning (SSL) has emerged as a pivotal area of research in recent years.<n>Despite the remarkable achievements of these graph SSL methods, their current implementation poses significant challenges for beginners.<n>We present a Graph SSL toolkit named PyG-SSL, which is built upon PyTorch and is compatible with various deep learning and scientific computing backends.
arXiv Detail & Related papers (2024-12-30T18:32:05Z) - Do Neural Scaling Laws Exist on Graph Self-Supervised Learning? [9.297227372861876]
Self-supervised learning(SSL) is essential to obtain foundation models in NLP and CV domains via effectively leveraging knowledge in large-scale unlabeled data.
It remains a mystery whether existing SSL in the graph domain can follow the scaling behavior toward building Graph Foundation Models(GFMs) with large-scale pre-training.
This paper examines existing SSL techniques for the feasibility of Graph SSL techniques in developing GFMs and opens a new direction for graph SSL design with the new evaluation prototype.
arXiv Detail & Related papers (2024-08-20T23:45:11Z) - A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification [51.35500308126506]
Self-supervised learning (SSL) is a machine learning approach where the data itself provides supervision, eliminating the need for external labels.
We study how classification-based evaluation protocols for SSL correlate and how well they predict downstream performance on different dataset types.
arXiv Detail & Related papers (2024-07-16T23:17:36Z) - Visual Self-supervised Learning Scheme for Dense Prediction Tasks on X-ray Images [3.782392436834913]
Self-supervised learning (SSL) has led to considerable progress in natural language processing (NLP)
However, the incorporation of contrastive learning into existing visual SSL models has led to considerable progress, often surpassing supervised counterparts.
Here, we focus on dense prediction tasks using security inspection x-ray images to evaluate our proposed model, Segment localization (SegLoc)
Based upon the Instance localization (InsLoc) model, SegLoc addresses one of the key challenges of contrastive learning, i.e., false negative pairs of query embeddings.
arXiv Detail & Related papers (2023-10-12T15:42:17Z) - In-Domain Self-Supervised Learning Improves Remote Sensing Image Scene
Classification [5.323049242720532]
Self-supervised learning has emerged as a promising approach for remote sensing image classification.
We present a study of different self-supervised pre-training strategies and evaluate their effect across 14 downstream datasets.
arXiv Detail & Related papers (2023-07-04T10:57:52Z) - Analyzing Data-Centric Properties for Contrastive Learning on Graphs [32.69353929886551]
We investigate how do graph SSL methods, such as contrastive learning (CL), work well?
Our work rigorously contextualizes, both empirically and theoretically, the effects of data-centric properties on augmentation strategies and learning paradigms for graph SSL.
arXiv Detail & Related papers (2022-08-04T17:58:37Z) - Bringing Your Own View: Graph Contrastive Learning without Prefabricated
Data Augmentations [94.41860307845812]
Self-supervision is recently surging at its new frontier of graph learning.
GraphCL uses a prefabricated prior reflected by the ad-hoc manual selection of graph data augmentations.
We have extended the prefabricated discrete prior in the augmentation set, to a learnable continuous prior in the parameter space of graph generators.
We have leveraged both principles of information minimization (InfoMin) and information bottleneck (InfoBN) to regularize the learned priors.
arXiv Detail & Related papers (2022-01-04T15:49:18Z) - Graph-based Semi-supervised Learning: A Comprehensive Review [51.26862262550445]
Semi-supervised learning (SSL) has tremendous value in practice due to its ability to utilize both labeled data and unlabelled data.
An important class of SSL methods is to naturally represent data as graphs, which corresponds to graph-based semi-supervised learning (GSSL) methods.
GSSL methods have demonstrated their advantages in various domains due to their uniqueness of structure, the universality of applications, and their scalability to large scale data.
arXiv Detail & Related papers (2021-02-26T05:11:09Z) - Graph Contrastive Learning with Augmentations [109.23158429991298]
We propose a graph contrastive learning (GraphCL) framework for learning unsupervised representations of graph data.
We show that our framework can produce graph representations of similar or better generalizability, transferrability, and robustness compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-10-22T20:13:43Z) - Contrastive and Generative Graph Convolutional Networks for Graph-based
Semi-Supervised Learning [64.98816284854067]
Graph-based Semi-Supervised Learning (SSL) aims to transfer the labels of a handful of labeled data to the remaining massive unlabeled data via a graph.
A novel GCN-based SSL algorithm is presented in this paper to enrich the supervision signals by utilizing both data similarities and graph structure.
arXiv Detail & Related papers (2020-09-15T13:59:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.