Pitfalls in Link Prediction with Graph Neural Networks: Understanding
the Impact of Target-link Inclusion & Better Practices
- URL: http://arxiv.org/abs/2306.00899v2
- Date: Mon, 18 Dec 2023 01:10:59 GMT
- Title: Pitfalls in Link Prediction with Graph Neural Networks: Understanding
the Impact of Target-link Inclusion & Better Practices
- Authors: Jing Zhu, Yuhang Zhou, Vassilis N. Ioannidis, Shengyi Qian, Wei Ai,
Xiang Song, Danai Koutra
- Abstract summary: Graph Neural Networks (GNNs) are remarkably successful in a variety of high-impact applications.
In link prediction, the common practices of including the edges being predicted in the graph at training and/or test have outsized impact on the performance of low-degree nodes.
We introduce an effective and efficient GNN training framework, SpotTarget, which leverages our insight on low-degree nodes.
- Score: 28.88423949622
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: While Graph Neural Networks (GNNs) are remarkably successful in a variety of
high-impact applications, we demonstrate that, in link prediction, the common
practices of including the edges being predicted in the graph at training
and/or test have outsized impact on the performance of low-degree nodes. We
theoretically and empirically investigate how these practices impact node-level
performance across different degrees. Specifically, we explore three issues
that arise: (I1) overfitting; (I2) distribution shift; and (I3) implicit test
leakage. The former two issues lead to poor generalizability to the test data,
while the latter leads to overestimation of the model's performance and
directly impacts the deployment of GNNs. To address these issues in a
systematic way, we introduce an effective and efficient GNN training framework,
SpotTarget, which leverages our insight on low-degree nodes: (1) at training
time, it excludes a (training) edge to be predicted if it is incident to at
least one low-degree node; and (2) at test time, it excludes all test edges to
be predicted (thus, mimicking real scenarios of using GNNs, where the test data
is not included in the graph). SpotTarget helps researchers and practitioners
adhere to best practices for learning from graph data, which are frequently
overlooked even by the most widely-used frameworks. Our experiments on various
real-world datasets show that SpotTarget makes GNNs up to 15x more accurate in
sparse graphs, and significantly improves their performance for low-degree
nodes in dense graphs.
Related papers
- Online GNN Evaluation Under Test-time Graph Distribution Shifts [92.4376834462224]
A new research problem, online GNN evaluation, aims to provide valuable insights into the well-trained GNNs's ability to generalize to real-world unlabeled graphs.
We develop an effective learning behavior discrepancy score, dubbed LeBeD, to estimate the test-time generalization errors of well-trained GNN models.
arXiv Detail & Related papers (2024-03-15T01:28:08Z) - Breaking the Entanglement of Homophily and Heterophily in
Semi-supervised Node Classification [25.831508778029097]
We introduce AMUD, which quantifies the relationship between node profiles and topology from a statistical perspective.
We also propose ADPA as a new directed graph learning paradigm for AMUD.
arXiv Detail & Related papers (2023-12-07T07:54:11Z) - GNNEvaluator: Evaluating GNN Performance On Unseen Graphs Without Labels [81.93520935479984]
We study a new problem, GNN model evaluation, that aims to assess the performance of a specific GNN model trained on labeled and observed graphs.
We propose a two-stage GNN model evaluation framework, including (1) DiscGraph set construction and (2) GNNEvaluator training and inference.
Under the effective training supervision from the DiscGraph set, GNNEvaluator learns to precisely estimate node classification accuracy of the to-be-evaluated GNN model.
arXiv Detail & Related papers (2023-10-23T05:51:59Z) - Towards Temporal Edge Regression: A Case Study on Agriculture Trade
Between Nations [4.612412025217201]
Graph Neural Networks (GNNs) have shown promising performance in tasks on dynamic graphs.
In this paper, we explore the application of GNNs to edge regression tasks in both static and dynamic settings.
arXiv Detail & Related papers (2023-08-15T17:13:16Z) - Addressing the Impact of Localized Training Data in Graph Neural
Networks [0.0]
Graph Neural Networks (GNNs) have achieved notable success in learning from graph-structured data.
This article aims to assess the impact of training GNNs on localized subsets of the graph.
We propose a regularization method to minimize distributional discrepancies between localized training data and graph inference.
arXiv Detail & Related papers (2023-07-24T11:04:22Z) - Stable Prediction on Graphs with Agnostic Distribution Shift [105.12836224149633]
Graph neural networks (GNNs) have been shown to be effective on various graph tasks with randomly separated training and testing data.
In real applications, however, the distribution of training graph might be different from that of the test one.
We propose a novel stable prediction framework for GNNs, which permits both locally and globally stable learning and prediction on graphs.
arXiv Detail & Related papers (2021-10-08T02:45:47Z) - Shift-Robust GNNs: Overcoming the Limitations of Localized Graph
Training data [52.771780951404565]
Shift-Robust GNN (SR-GNN) is designed to account for distributional differences between biased training data and the graph's true inference distribution.
We show that SR-GNN outperforms other GNN baselines by accuracy, eliminating at least (40%) of the negative effects introduced by biased training data.
arXiv Detail & Related papers (2021-08-02T18:00:38Z) - Combining Label Propagation and Simple Models Out-performs Graph Neural
Networks [52.121819834353865]
We show that for many standard transductive node classification benchmarks, we can exceed or match the performance of state-of-the-art GNNs.
We call this overall procedure Correct and Smooth (C&S)
Our approach exceeds or nearly matches the performance of state-of-the-art GNNs on a wide variety of benchmarks.
arXiv Detail & Related papers (2020-10-27T02:10:52Z) - Distance Encoding: Design Provably More Powerful Neural Networks for
Graph Representation Learning [63.97983530843762]
Graph Neural Networks (GNNs) have achieved great success in graph representation learning.
GNNs generate identical representations for graph substructures that may in fact be very different.
More powerful GNNs, proposed recently by mimicking higher-order tests, are inefficient as they cannot sparsity of underlying graph structure.
We propose Distance Depiction (DE) as a new class of graph representation learning.
arXiv Detail & Related papers (2020-08-31T23:15:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.