Graph Contrastive Learning versus Untrained Baselines: The Role of Dataset Size
- URL: http://arxiv.org/abs/2509.01541v1
- Date: Mon, 01 Sep 2025 15:16:28 GMT
- Title: Graph Contrastive Learning versus Untrained Baselines: The Role of Dataset Size
- Authors: Smayan Khanna, Doruk Efe Gökmen, Risi Kondor, Vincenzo Vitelli,
- Abstract summary: Graph Contrastive Learning (GCL) has emerged as a leading paradigm for self- supervised learning on graphs.<n>We find that GCL's advantage depends strongly on dataset size and task difficulty.
- Score: 4.282194363742351
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph Contrastive Learning (GCL) has emerged as a leading paradigm for self- supervised learning on graphs, with strong performance reported on standardized datasets and growing applications ranging from genomics to drug discovery. We ask a basic question: does GCL actually outperform untrained baselines? We find that GCL's advantage depends strongly on dataset size and task difficulty. On standard datasets, untrained Graph Neural Networks (GNNs), simple multilayer perceptrons, and even handcrafted statistics can rival or exceed GCL. On the large molecular dataset ogbg-molhiv, we observe a crossover: GCL lags at small scales but pulls ahead beyond a few thousand graphs, though this gain eventually plateaus. On synthetic datasets, GCL accuracy approximately scales with the logarithm of the number of graphs and its performance gap (compared with untrained GNNs) varies with respect to task complexity. Moving forward, it is crucial to identify the role of dataset size in benchmarks and applications, as well as to design GCL algorithms that avoid performance plateaus.
Related papers
- Can LLMs Alleviate Catastrophic Forgetting in Graph Continual Learning? A Systematic Study [35.60356938705585]
Real-world data, including graph-structure data, often arrives in a streaming manner, which means that learning systems need to continuously acquire new knowledge.<n>We propose a simple-yet-effective method, Simple Graph Continual Learning (SimGCL), that surpasses the previous state-of-the-art GNN-based baseline by around 20%.
arXiv Detail & Related papers (2025-05-24T13:43:29Z) - GRE^2-MDCL: Graph Representation Embedding Enhanced via Multidimensional Contrastive Learning [0.0]
Graph representation learning has emerged as a powerful tool for preserving graph topology when mapping nodes to vector representations.<n>Current graph neural network models face the challenge of requiring extensive labeled data.<n>We propose Graph Representation Embedding Enhanced via Multidimensional Contrastive Learning.
arXiv Detail & Related papers (2024-09-12T03:09:05Z) - Rethinking and Simplifying Bootstrapped Graph Latents [48.76934123429186]
Graph contrastive learning (GCL) has emerged as a representative paradigm in graph self-supervised learning.
We present SGCL, a simple yet effective GCL framework that utilizes the outputs from two consecutive iterations as positive pairs.
We show that SGCL can achieve competitive performance with fewer parameters, lower time and space costs, and significant convergence speedup.
arXiv Detail & Related papers (2023-12-05T09:49:50Z) - Architecture Matters: Uncovering Implicit Mechanisms in Graph
Contrastive Learning [34.566003077992384]
We present a systematic study of various graph contrastive learning (GCL) methods.
By uncovering how the implicit inductive bias of GNNs works in contrastive learning, we theoretically provide insights into the above intriguing properties of GCL.
Rather than directly porting existing NN methods to GCL, we advocate for more attention toward the unique architecture of graph learning.
arXiv Detail & Related papers (2023-11-05T15:54:17Z) - Localized Contrastive Learning on Graphs [110.54606263711385]
We introduce a simple yet effective contrastive model named Localized Graph Contrastive Learning (Local-GCL)
In spite of its simplicity, Local-GCL achieves quite competitive performance in self-supervised node representation learning tasks on graphs with various scales and properties.
arXiv Detail & Related papers (2022-12-08T23:36:00Z) - A Comprehensive Study on Large-Scale Graph Training: Benchmarking and
Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs)
We present a new ensembling training manner, named EnGCN, to address the existing issues.
Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z) - Graph Soft-Contrastive Learning via Neighborhood Ranking [19.241089079154044]
Graph Contrastive Learning (GCL) has emerged as a promising approach in the realm of graph self-supervised learning.
We propose a novel paradigm, Graph Soft-Contrastive Learning (GSCL)
GSCL facilitates GCL via neighborhood ranking, avoiding the need to specify absolutely similar pairs.
arXiv Detail & Related papers (2022-09-28T09:52:15Z) - Rethinking and Scaling Up Graph Contrastive Learning: An Extremely
Efficient Approach with Group Discrimination [87.07410882094966]
Graph contrastive learning (GCL) alleviates the heavy reliance on label information for graph representation learning (GRL)
We introduce a new learning paradigm for self-supervised GRL, namely, Group Discrimination (GD)
Instead of similarity computation, GGD directly discriminates two groups of summarised node instances with a simple binary cross-entropy loss.
In addition, GGD requires much fewer training epochs to obtain competitive performance compared with GCL methods on large-scale datasets.
arXiv Detail & Related papers (2022-06-03T12:32:47Z) - Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive
Benchmark Study [100.27567794045045]
Training deep graph neural networks (GNNs) is notoriously hard.
We present the first fair and reproducible benchmark dedicated to assessing the "tricks" of training deep GNNs.
arXiv Detail & Related papers (2021-08-24T05:00:37Z) - Adversarial Graph Augmentation to Improve Graph Contrastive Learning [21.54343383921459]
We propose a novel principle, termed adversarial-GCL (AD-GCL), which enables GNNs to avoid capturing redundant information during the training.
We experimentally validate AD-GCL by comparing with the state-of-the-art GCL methods and achieve performance gains of up-to $14%$ in unsupervised, $6%$ in transfer, and $3%$ in semi-supervised learning settings.
arXiv Detail & Related papers (2021-06-10T15:34:26Z) - Graph Contrastive Learning with Augmentations [109.23158429991298]
We propose a graph contrastive learning (GraphCL) framework for learning unsupervised representations of graph data.
We show that our framework can produce graph representations of similar or better generalizability, transferrability, and robustness compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-10-22T20:13:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.